Netdev List
 help / color / mirror / Atom feed
* Re: Null pointer dereference in icmp_send
From: Eric Dumazet @ 2011-05-16 21:27 UTC (permalink / raw)
  To: Aristide Fattori; +Cc: netdev, roberto.paleari
In-Reply-To: <BANLkTi=uBDOQOJJMGn6V0Ne7OjQ7kGc-2w@mail.gmail.com>

Le lundi 16 mai 2011 à 23:06 +0200, Aristide Fattori a écrit :
> Hi everybody,
> 
> in function icmp_send() (net/ipv4/icmp.c), the parameter passed to
> dev_net() function is not properly validated. This can lead to a NULL
> pointer dereference that crashes the kernel. The bug can be triggered
> remotely, by flooding the target with fragmented IPv4 packets.
> Important fields in the IP packet are:
>  * Flags: the MF flag must be set.
>  * Fragment ID: using pseudo-random values for this field quickly
> fills fragmented queues in the victim's kernel, as it is unable to
> easily reassemble received packets.
>  * TOS: using pseudo-random values for this field triggers the
> creation of more than one route cache entry for the same destination
> address, increasing the chances of incurring in the error condition
> described before.
> Other fields of the packet do not really matter, and they can be set
> to arbitrary values.
> 
> If you are interested, we can provide a small and very dirty python
> script that easily triggers the error condition.
> 

Hi

You forgot to tell us which linux version you used ?

We had some fixes lately in this area.

Thanks



^ permalink raw reply

* Re: kernel bug relating to networking
From: David Miller @ 2011-05-16 21:27 UTC (permalink / raw)
  To: bhutchings; +Cc: bjlockie, linux-kernel, netdev
In-Reply-To: <1305580825.2885.49.camel@bwh-desktop>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Mon, 16 May 2011 22:20:25 +0100

> On Mon, 2011-05-16 at 16:47 -0400, James wrote:
>> On 05/16/11 11:47, Daniel Baluta wrote:
>> > On Mon, May 16, 2011 at 6:35 PM, James <bjlockie@lockie.ca> wrote:
>> >> I originally posted to linux-net@vger.kernel.org but all that list
>> > This should be netdev@vger.kernel.org.
>> That is confusing since the welcome message said: "welcome message for
>> linux-net@vger.kernel.org".
>> 
>> Both lists are listed at http://vger.kernel.org/vger-lists.html but it
>> doesn't say the difference.
> 
> linux-net: spam and unanswered questions
> netdev: actual discussions
> 
> David, is it not time to stop the confusion by shutting down linux-net?

Agreed, linux-net has been deleted.

^ permalink raw reply

* Re: [PATCH V5 4/6 net-next] vhost: vhost TX zero-copy support
From: Michael S. Tsirkin @ 2011-05-16 21:24 UTC (permalink / raw)
  To: Shirley Ma
  Cc: David Miller, Eric Dumazet, Avi Kivity, Arnd Bergmann, netdev,
	kvm, linux-kernel
In-Reply-To: <1305579414.3456.49.camel@localhost.localdomain>

On Mon, May 16, 2011 at 01:56:54PM -0700, Shirley Ma wrote:
> On Mon, 2011-05-16 at 23:45 +0300, Michael S. Tsirkin wrote:
> > > +/* Since we need to keep the order of used_idx as avail_idx, it's
> > possible that
> > > + * DMA done not in order in lower device driver for some reason. To
> > prevent
> > > + * used_idx out of order, upend_idx is used to track avail_idx
> > order, done_idx
> > > + * is used to track used_idx order. Once lower device DMA done,
> > then upend_idx
> > > + * can move to done_idx.
> > 
> > Could you clarify this please? virtio explicitly allows out of order
> > completion of requests. Does it simplify code that we try to keep
> > used index updates in-order? Because if not, this is not
> > really a requirement.
> 
> Hello Mike,
> 
> Based on my testing, vhost_add_used() must be in order from
> vhost_get_vq_desc(). Otherwise, virtio_net ring seems get double
> freed.

Double-freed or you get NULL below?

> I didn't spend time on debugging this.
> 
> in virtqueue_get_buf
> 
>         if (unlikely(!vq->data[i])) {
>                 BAD_RING(vq, "id %u is not a head!\n", i);
>                 return NULL;
>         }

Yes but i used here is the head that we read from the
ring, not the ring index itself.
	i = vq->vring.used->ring[vq->last_used_idx%vq->vring.num].id
we must complete any id only once, but in any order.

> That's the reason I created the upend_idx and done_idx.
> 
> Thanks
> Shirley

Very strange, it sounds like a bug, but I can't tell where: in
host or in guest. If it's in the guest, we must fix it.
If in host, we should only fix it if it makes life simpler for us.
Could you try to nail it down pls?  Another question: will code get
simpler or more complex if that restriction's removed?

-- 
MST

^ permalink raw reply

* Re: Null pointer dereference in icmp_send
From: David Miller @ 2011-05-16 21:22 UTC (permalink / raw)
  To: joystick; +Cc: netdev, roberto.paleari
In-Reply-To: <BANLkTi=uBDOQOJJMGn6V0Ne7OjQ7kGc-2w@mail.gmail.com>

From: Aristide Fattori <joystick@idea.sec.dico.unimi.it>
Date: Mon, 16 May 2011 23:06:32 +0200

> in function icmp_send() (net/ipv4/icmp.c), the parameter passed to
> dev_net() function is not properly validated.

It doesn't need to be.

If 'rt' is not NULL, then rt->dst.dev is always not NULL and
therefore no checks are necessary.

^ permalink raw reply

* Re: kernel bug relating to networking
From: Ben Hutchings @ 2011-05-16 21:20 UTC (permalink / raw)
  To: James, David Miller; +Cc: Kernel Mailing List, netdev
In-Reply-To: <4DD18D52.9070302@lockie.ca>

On Mon, 2011-05-16 at 16:47 -0400, James wrote:
> On 05/16/11 11:47, Daniel Baluta wrote:
> > On Mon, May 16, 2011 at 6:35 PM, James <bjlockie@lockie.ca> wrote:
> >> I originally posted to linux-net@vger.kernel.org but all that list
> > This should be netdev@vger.kernel.org.
> That is confusing since the welcome message said: "welcome message for
> linux-net@vger.kernel.org".
> 
> Both lists are listed at http://vger.kernel.org/vger-lists.html but it
> doesn't say the difference.

linux-net: spam and unanswered questions
netdev: actual discussions

David, is it not time to stop the confusion by shutting down linux-net?

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* pull request: wireless-next-2.6 2011-05-16
From: John W. Linville @ 2011-05-16 21:13 UTC (permalink / raw)
  To: davem; +Cc: linux-wireless, netdev

Dave,

Still another big batch of wireless LAN stuff intended for 2.6.40 -- the
wireless folks have really been eating their Wheaties this cycle!

Highlights of this batch include a new driver in the rtlwifi family,
some new AMBA-like bus infrastructure that is specific to Broadcom
devices, a Bluetooth pull from Gustavo and friends, a wl12xx pull from
Luca and friends, some mesh updates from the Cozybit folks, some more
fixups from the mwifiex team, a collection of mac80211 improvements from
Johannes, the usual flutter of patches around iwlwifi and ath9k, and a
spread of other updates.

Please let me know if there are problems!

Thanks,

John

---

The following changes since commit 1a8218e96271790a07dd7065a2ef173e0f67e328:

  net: ping: dont call udp_ioctl() (2011-05-16 11:49:39 -0400)

are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6.git for-davem

Amitkumar Karwar (4):
      mwifiex: fix simultaneous assoc and scan issue
      mwifiex: remove unnecessary struct mwifiex_opt_sleep_confirm_buffer
      mwifiex: remove redundant local structures
      mwifiex: remove mwifiex_recv_complete function

Andy Ross (1):
      Bluetooth: Device ids for ath3k on Pegatron Lucid tablets

Arik Nemtsov (13):
      wl12xx: implement the tx_frames_pending mac80211 callback
      wl12xx: discard corrupted packets in RX
      wl12xx: add BT-coexistance for AP
      wl12xx: use wiphy values for setting rts, frag thresholds on init
      wl12xx: AP-mode - disable beacon filtering on start up
      wl12xx: schedule recovery on command timeout
      wl12xx: print firmware program counter during recovery
      wl12xx: AP-mode - overhaul rate policy configuration
      wl12xx: AP-mode - reconfigure templates after basic rates change
      wl12xx: add debugfs entry for starting recovery
      wl12xx: fix race condition during recovery in AP mode
      wl12xx: export driver state to debugfs
      mac80211: set TID of internal mgmt packets to 7

Ben Greear (1):
      ath5k: Fix lockup due to un-init spinlock.

Bing Zhao (1):
      mwifiex: cleanup ioctl.h

Chaoming Li (13):
      rtlwifi: rtl8192se: Merge def.h
      rtlwifi: rtl8192se: Merge dynamic management routines
      rtlwifi: rtl8192se: Merge firmware routines
      rtlwifi: rtl8192se: Merge hardware routines
      rtlwifi: rtl8192se: Merge led routines
      rtlwifi: rtl8192se: Merge phy routines
      rtlwifi: rtl8192se: Merge register definitions
      rtlwifi: rtl8192se: Merge rf routines
      rtlwifi: rtl8192se: Merge main (sw) routines
      rtlwifi: rtl8192se: Merge table routines
      rtlwifi: rtl8192se: Merge TX and RX routines
      rtlwifi: rtl8192se: Modify Kconfig and Makefile routines for new driver
      rtlwifi: rtl8192se: Remove need to disable ASPM

Christian Lamparter (2):
      carl9170: fix -Wunused-but-set-variable warnings
      p54pci: fix -Wunused-but-set-variable warnings

Christoph Fritz (1):
      mwifiex: fix null derefs, mem leaks and trivia

Cindy H. Kao (1):
      iwlwifi: support the svtool messages interactions through nl80211 test mode

Daniel Drake (1):
      libertas: remove tx_timeout handler

Daniel Halperin (1):
      mac80211: fix contention time computation in minstrel, minstrel_ht

Eliad Peller (13):
      wl12xx: sleep instead of wakeup after tx work
      wl12xx: avoid premature elp entrance
      wl12xx: print actual rx packet size (without padding)
      wl12xx: avoid redundant join on interface reconfiguration
      wl12xx: configure rates when working in ibss mode
      wl12xx: add debugfs entries for dtim_interval and beacon_interval
      wl12xx: simplify wl1271_ssid_set()
      wl12xx_sdio: set interrupt as wake_up interrupt
      wl12xx: declare suspend/resume callbacks (for wowlan)
      wl12xx_sdio: set MMC_PM_KEEP_POWER flag on suspend
      wl12xx: prevent scheduling while suspending (WoW enabled)
      wl12xx_sdio: declare support for NL80211_WOW_TRIGGER_ANYTHING trigger
      wl12xx: enter/exit psm on wowlan suspend/resume

Fabrice Deyber (1):
      mac80211: Only process mesh PREPs with equal seq number if metric is better.

Felix Fietkau (1):
      ath9k: fix a regression in PS frame filter handling

Gertjan van Wingerde (2):
      rt2x00: Initial support for RT5370 USB devices.
      rt2x00: Fix rmmod hang of rt2800pci

Gustavo F. Padovan (4):
      Bluetooth: Add l2cap_add_psm() and l2cap_add_scid()
      Bluetooth: Handle psm == 0 case inside l2cap_add_psm()
      Bluetooth: Remove l2cap_sk_list
      Bluetooth: Remove leftover debug messages

Hauke Mehrtens (1):
      wl12xx: do not set queue_mapping directly

Ido Yariv (3):
      wl12xx: Modify memory configuration for 128x/AP
      wl12xx: Restart TX when TX descriptors are available
      wl12xx: Enable dynamic memory for 127x

Ivo van Doorn (1):
      rt2x00: Fix transfer speed regression for USB hardware

Javier Cardona (12):
      nl80211: Introduce NL80211_MESH_SETUP_USERSPACE_AMPE
      mac80211: Let userspace send action frames over mesh interfaces
      mac80211: Drop MESH_PLINK category and use new ANA-approved MESH_ACTION
      open80211s: Stop using zero for address 3 in mesh plink mgmt frames
      cfg80211: Use capability info to detect mesh beacons.
      nl80211: Let userspace drive the peer link management states.
      mac80211: Check size of a new mesh path table for changes since allocation.
      mac80211: Fix locking bug on mesh path table access
      mac80211: Move call to mpp_path_lookup inside RCU-read section
      mac80211: allow setting supported rates on mesh peers
      ath9k: fix beaconing for mesh interfaces
      nl80211: Move peer link state definition to nl80211

Joe Perches (3):
      rtlwifi: rtl8192cu: Fix memset/memcpy using sizeof(ptr) not sizeof(*ptr)
      libertas: Convert lbs_pr_<level> to pr_<level>
      libertas: Use netdev_<level> or dev_<level> where possible

Johannes Berg (18):
      nl80211/cfg80211: WoWLAN support
      mac80211: add basic support for WoWLAN
      iwlagn: remove get_hcmd_size indirection
      iwlagn: remove frame pre-allocation
      iwlagn: remove unused variable
      iwlagn: dont update bytecount table for command queue
      iwlagn: remove bytecount indirection
      iwlagn: check DMA mapping errors
      iwlagn: fix iwl_is_any_associated
      cfg80211: restrict AP beacon intervals
      mac80211: remove pointless mesh path timer RCU code
      mac80211: make key locking clearer
      mac80211: fix another key non-race
      mac80211: fix a few RCU issues
      mac80211: mesh: move some code to make it static
      cfg80211: advertise possible interface combinations
      mac80211: fix TX a-MPDU locking
      mac80211: sparse RCU annotations

John W. Linville (6):
      Merge branch 'for-linville' of git://git.kernel.org/.../luca/wl12xx
      Merge branch 'wireless-next-2.6' of git://git.kernel.org/.../iwlwifi/iwlwifi-2.6
      Merge branch 'master' of git://git.kernel.org/.../padovan/bluetooth-next-2.6
      ssb: fix pcicore build breakage
      Merge branch 'for-linville' of git://git.kernel.org/.../luca/wl12xx
      Merge branch 'master' of git://git.kernel.org/.../linville/wireless-next-2.6 into for-davem

Jouni Malinen (2):
      nl80211: Fix set_key regression with some drivers
      cfg80211: Remove unused wiphy flag

Julia Lawall (1):
      net/rfkill/core.c: Avoid leaving freed data in a list

Larry Finger (2):
      mac80211: Fix build error when CONFIG_PM is not defined
      rtlwifi: Move 2 large arrays off stack

Luciano Coelho (15):
      wl12xx: strict_stroul introduced converted to kstrtoul
      Revert "wl12xx: support FW TX inactivity triggers"
      mac80211: don't drop frames where skb->len < 24 in ieee80211_scan_rx()
      mac80211: add a couple of trace event classes to reduce duplicated code
      cfg80211/nl80211: add support for scheduled scans
      mac80211: add support for HW scheduled scan
      cfg80211/nl80211: add interval attribute for scheduled scans
      cfg80211/mac80211: avoid bounce back mac->cfg->mac on sched_scan_stopped
      wl12xx: add configuration values for scheduled scan
      wl12xx: listen to scheduled scan events
      wl12xx: add scheduled scan structures and commands
      wl12xx: implement scheduled scan driver operations and reporting
      wl12xx: export scheduled scan state in debugfs
      wl12xx: prevent sched_scan when not idle or not in station mode
      wl12xx: remove unused flag WL1271_FLAG_IDLE_REQUESTED

Luis R. Rodriguez (2):
      ath9k_hw: fix power for the HT40 duplicate frames
      ath9k_hw: fix dual band assumption for XB113

Mohammed Shafi Shajakhan (12):
      ath9k_hw: remove aggregation protection mode
      ath9k_hw: remove get_channel_noise function
      ath9k_hw: make antenna diversity modules chip specific
      ath9k_hw: enable Antenna diversity for AR9485
      ath9k_hw: define registers/macros to support Antenna diversity
      ath9k_hw: config diversity based on eeprom contents
      ath9k_hw: define modules to get/set Antenna diversity paramaters
      ath9k_hw: define antenna diversity group
      ath9k: Implement an API to swap main/ALT LNA's
      ath9k: configure fast_div_bias based on diversity group
      ath9k: make sure main_rssi is positive
      ath9k: make npending frames check as bool

Nicolas Cavallari (1):
      carl9170: fix allmulticast mode

Rafał Miłecki (13):
      b43: drop invalid IMCFGLO workaround
      b43legacy: drop invalid IMCFGLO workaround
      b43: drop ssb-duplicated workaround for dangling cores
      b43legacy: drop ssb-duplicated workaround for dangling cores
      b43: trivial: include ssb word in ssb specific functions
      bcma: add Broadcom specific AMBA bus driver
      ssb: update list of devices supporting multiple 80211 cores
      b43legacy: trivial: use TMSLOW def instead of magic value
      b43: move MAC PHY clock controling function
      bcma: add missing GPIO defines, use PULL register only when available
      ssb: move ssb_commit_settings and export it
      b43: implement timeouts workaround
      bcma: pci: trivial: correct amount of maximum retries

Rajkumar Manoharan (11):
      ath9k: Fix drain txq failure in flush
      mac80211: use wake_queue to restart trasmit
      mac80211: Postpond ps timer if tx is stopped by others
      ath9k_hw: do noise floor calibration only on required chains
      wireless: Fix warnings due to -Wunused-but-set-variable
      ath9k: avoid enabling interrupts while processing rx
      ath9k: process TSF out of range before RX
      ath9k_hw: Corrected xpabiaslevel register settings for AR9340
      ath9k_hw: Change DCU backoff thresh for AR9340
      ath9k: Fix rssi update in ad-hoc mode
      ath9k: Failed to set default beacon rssi in AP/IBSS mode

Sascha Silbe (1):
      libertas: Add libertas_disablemesh module parameter to disable mesh interface

Senthil Balasubramanian (1):
      ath9k_hw: Fix STA connection issues with AR9380 (XB113).

Shahar Levi (6):
      wl12xx: Set End-of-transaction Flag at Wl127x AP Mode
      wl12xx: Set correct REF CLK and TCXO CLK values to the FW
      wl12xx: FM WLAN coexistence
      wl12xx: Update Power Save Exit Retries Packets
      wl12xx: Don't filter beacons that include changed HT IEs
      wl12xx: add IEEE80211_HW_SPECTRUM_MGMT bit to the hw flags

Stephen Boyd (2):
      iwlegacy: Silence DEBUG_STRICT_USER_COPY_CHECKS=y warning
      iwlwifi: Silence DEBUG_STRICT_USER_COPY_CHECKS=y warning

Thomas Pedersen (3):
      nl80211: allow installing keys for a meshif
      nl80211: allow setting MFP flag for a meshif
      mac80211: Self-protected management frames are not robust

Vinicius Costa Gomes (2):
      Bluetooth: Add support for sending connection events for LE links
      Bluetooth: Add support for disconnecting LE links via mgmt

Waldemar Rymarkiewicz (1):
      Bluetooth: Double check sec req for pre 2.1 device

Wey-Yi Guy (1):
      iwlagn: led stay solid on when no traffic

Yogesh Ashok Powar (6):
      mwifiex: remove unnecessary variable initialization
      mwl8k: Fix broken WEP
      mwl8k: Do not ask mac80211 to generate IV for crypto keys
      mac80211: Fix mesh-related build breakage...
      cfg80211: make stripping of 802.11 header optional from AMSDU
      mwifiex: use ieee80211_amsdu_to_8023s routine

 Documentation/ABI/testing/sysfs-bus-bcma        |   31 +
 MAINTAINERS                                     |    7 +
 drivers/Kconfig                                 |    2 +
 drivers/Makefile                                |    1 +
 drivers/bcma/Kconfig                            |   33 +
 drivers/bcma/Makefile                           |    7 +
 drivers/bcma/README                             |   19 +
 drivers/bcma/TODO                               |    3 +
 drivers/bcma/bcma_private.h                     |   28 +
 drivers/bcma/core.c                             |   51 +
 drivers/bcma/driver_chipcommon.c                |   89 +
 drivers/bcma/driver_chipcommon_pmu.c            |  134 ++
 drivers/bcma/driver_pci.c                       |  163 ++
 drivers/bcma/host_pci.c                         |  196 ++
 drivers/bcma/main.c                             |  247 +++
 drivers/bcma/scan.c                             |  360 ++++
 drivers/bcma/scan.h                             |   56 +
 drivers/bluetooth/ath3k.c                       |    1 +
 drivers/bluetooth/btusb.c                       |    1 +
 drivers/net/wireless/ath/ath5k/base.c           |    2 +-
 drivers/net/wireless/ath/ath9k/ar9002_mac.c     |   10 -
 drivers/net/wireless/ath/ath9k/ar9002_phy.c     |   44 +-
 drivers/net/wireless/ath/ath9k/ar9003_eeprom.c  |   78 +-
 drivers/net/wireless/ath/ath9k/ar9003_mac.c     |   11 -
 drivers/net/wireless/ath/ath9k/ar9003_phy.c     |   46 +
 drivers/net/wireless/ath/ath9k/ar9003_phy.h     |   22 +
 drivers/net/wireless/ath/ath9k/ath9k.h          |    3 +-
 drivers/net/wireless/ath/ath9k/beacon.c         |   15 +-
 drivers/net/wireless/ath/ath9k/calib.c          |   21 +-
 drivers/net/wireless/ath/ath9k/calib.h          |    1 -
 drivers/net/wireless/ath/ath9k/hw-ops.h         |   16 +-
 drivers/net/wireless/ath/ath9k/hw.c             |   16 +
 drivers/net/wireless/ath/ath9k/hw.h             |   15 +-
 drivers/net/wireless/ath/ath9k/mac.c            |    9 +-
 drivers/net/wireless/ath/ath9k/main.c           |   47 +-
 drivers/net/wireless/ath/ath9k/recv.c           |  215 ++-
 drivers/net/wireless/ath/ath9k/xmit.c           |   11 +-
 drivers/net/wireless/ath/carl9170/main.c        |    2 +-
 drivers/net/wireless/ath/carl9170/tx.c          |    9 -
 drivers/net/wireless/b43/main.c                 |   69 +-
 drivers/net/wireless/b43/main.h                 |    1 +
 drivers/net/wireless/b43/phy_n.c                |   13 +-
 drivers/net/wireless/b43legacy/main.c           |   52 +-
 drivers/net/wireless/iwlegacy/iwl-4965-rs.c     |    2 +-
 drivers/net/wireless/iwlwifi/Kconfig            |   10 +
 drivers/net/wireless/iwlwifi/Makefile           |    1 +
 drivers/net/wireless/iwlwifi/iwl-1000.c         |    2 -
 drivers/net/wireless/iwlwifi/iwl-2000.c         |    2 -
 drivers/net/wireless/iwlwifi/iwl-5000.c         |    4 -
 drivers/net/wireless/iwlwifi/iwl-6000.c         |    4 -
 drivers/net/wireless/iwlwifi/iwl-agn-hcmd.c     |    7 -
 drivers/net/wireless/iwlwifi/iwl-agn-rs.c       |    2 +-
 drivers/net/wireless/iwlwifi/iwl-agn-tx.c       |  104 +-
 drivers/net/wireless/iwlwifi/iwl-agn-ucode.c    |    2 +-
 drivers/net/wireless/iwlwifi/iwl-agn.c          |  143 +-
 drivers/net/wireless/iwlwifi/iwl-agn.h          |   22 +-
 drivers/net/wireless/iwlwifi/iwl-core.h         |    6 -
 drivers/net/wireless/iwlwifi/iwl-dev.h          |   34 +-
 drivers/net/wireless/iwlwifi/iwl-led.c          |   20 +-
 drivers/net/wireless/iwlwifi/iwl-sv-open.c      |  469 +++++
 drivers/net/wireless/iwlwifi/iwl-testmode.h     |  151 ++
 drivers/net/wireless/iwlwifi/iwl-tx.c           |   22 +-
 drivers/net/wireless/iwmc3200wifi/rx.c          |    3 +-
 drivers/net/wireless/libertas/cfg.c             |   16 +-
 drivers/net/wireless/libertas/cmd.c             |   40 +-
 drivers/net/wireless/libertas/cmdresp.c         |   27 +-
 drivers/net/wireless/libertas/debugfs.c         |    5 +-
 drivers/net/wireless/libertas/defs.h            |    7 -
 drivers/net/wireless/libertas/if_cs.c           |   57 +-
 drivers/net/wireless/libertas/if_sdio.c         |   37 +-
 drivers/net/wireless/libertas/if_spi.c          |   83 +-
 drivers/net/wireless/libertas/if_usb.c          |   44 +-
 drivers/net/wireless/libertas/main.c            |   72 +-
 drivers/net/wireless/libertas/mesh.c            |    8 +-
 drivers/net/wireless/libertas/rx.c              |    7 +-
 drivers/net/wireless/mwifiex/11n.c              |    7 +-
 drivers/net/wireless/mwifiex/11n_aggr.c         |  136 +--
 drivers/net/wireless/mwifiex/11n_rxreorder.c    |   15 +-
 drivers/net/wireless/mwifiex/cfg80211.c         |   48 +-
 drivers/net/wireless/mwifiex/cmdevt.c           |   51 +-
 drivers/net/wireless/mwifiex/debugfs.c          |   10 +-
 drivers/net/wireless/mwifiex/fw.h               |   21 +-
 drivers/net/wireless/mwifiex/init.c             |   59 +-
 drivers/net/wireless/mwifiex/ioctl.h            |   81 +-
 drivers/net/wireless/mwifiex/join.c             |    9 +-
 drivers/net/wireless/mwifiex/main.c             |   27 +-
 drivers/net/wireless/mwifiex/main.h             |    8 +-
 drivers/net/wireless/mwifiex/scan.c             |   30 +-
 drivers/net/wireless/mwifiex/sdio.c             |   35 +-
 drivers/net/wireless/mwifiex/sta_cmd.c          |    8 +-
 drivers/net/wireless/mwifiex/sta_cmdresp.c      |   28 +-
 drivers/net/wireless/mwifiex/sta_ioctl.c        |   80 +-
 drivers/net/wireless/mwifiex/sta_rx.c           |   26 +-
 drivers/net/wireless/mwifiex/sta_tx.c           |    4 +-
 drivers/net/wireless/mwifiex/txrx.c             |   10 +-
 drivers/net/wireless/mwifiex/util.c             |   29 +-
 drivers/net/wireless/mwifiex/wmm.c              |    2 +-
 drivers/net/wireless/mwl8k.c                    |   18 +-
 drivers/net/wireless/p54/p54pci.c               |    3 +-
 drivers/net/wireless/rt2x00/Kconfig             |   11 +-
 drivers/net/wireless/rt2x00/rt2800.h            |    2 +
 drivers/net/wireless/rt2x00/rt2800lib.c         |    5 +-
 drivers/net/wireless/rt2x00/rt2800usb.c         |    8 +
 drivers/net/wireless/rt2x00/rt2x00dev.c         |    2 +-
 drivers/net/wireless/rt2x00/rt2x00usb.c         |    6 +-
 drivers/net/wireless/rtlwifi/Kconfig            |   15 +-
 drivers/net/wireless/rtlwifi/Makefile           |    1 +
 drivers/net/wireless/rtlwifi/efuse.c            |   35 +-
 drivers/net/wireless/rtlwifi/pci.c              |    1 +
 drivers/net/wireless/rtlwifi/rtl8192cu/trx.c    |    4 +-
 drivers/net/wireless/rtlwifi/rtl8192se/Makefile |   15 +
 drivers/net/wireless/rtlwifi/rtl8192se/def.h    |  598 ++++++
 drivers/net/wireless/rtlwifi/rtl8192se/dm.c     |  733 +++++++
 drivers/net/wireless/rtlwifi/rtl8192se/dm.h     |  164 ++
 drivers/net/wireless/rtlwifi/rtl8192se/fw.c     |  654 ++++++
 drivers/net/wireless/rtlwifi/rtl8192se/fw.h     |  375 ++++
 drivers/net/wireless/rtlwifi/rtl8192se/hw.c     | 2512 +++++++++++++++++++++++
 drivers/net/wireless/rtlwifi/rtl8192se/hw.h     |   79 +
 drivers/net/wireless/rtlwifi/rtl8192se/led.c    |  149 ++
 drivers/net/wireless/rtlwifi/rtl8192se/led.h    |   37 +
 drivers/net/wireless/rtlwifi/rtl8192se/phy.c    | 1740 ++++++++++++++++
 drivers/net/wireless/rtlwifi/rtl8192se/phy.h    |  101 +
 drivers/net/wireless/rtlwifi/rtl8192se/reg.h    | 1188 +++++++++++
 drivers/net/wireless/rtlwifi/rtl8192se/rf.c     |  546 +++++
 drivers/net/wireless/rtlwifi/rtl8192se/rf.h     |   43 +
 drivers/net/wireless/rtlwifi/rtl8192se/sw.c     |  423 ++++
 drivers/net/wireless/rtlwifi/rtl8192se/sw.h     |   36 +
 drivers/net/wireless/rtlwifi/rtl8192se/table.c  |  634 ++++++
 drivers/net/wireless/rtlwifi/rtl8192se/table.h  |   49 +
 drivers/net/wireless/rtlwifi/rtl8192se/trx.c    |  976 +++++++++
 drivers/net/wireless/rtlwifi/rtl8192se/trx.h    |   45 +
 drivers/net/wireless/wl12xx/acx.c               |  190 ++-
 drivers/net/wireless/wl12xx/acx.h               |  103 +-
 drivers/net/wireless/wl12xx/boot.c              |    6 +-
 drivers/net/wireless/wl12xx/cmd.c               |   18 +-
 drivers/net/wireless/wl12xx/conf.h              |  111 +-
 drivers/net/wireless/wl12xx/debugfs.c           |  240 +++
 drivers/net/wireless/wl12xx/event.c             |   70 +-
 drivers/net/wireless/wl12xx/event.h             |   12 +-
 drivers/net/wireless/wl12xx/init.c              |  110 +-
 drivers/net/wireless/wl12xx/init.h              |    2 +
 drivers/net/wireless/wl12xx/main.c              |  492 ++++-
 drivers/net/wireless/wl12xx/ps.c                |   30 +-
 drivers/net/wireless/wl12xx/ps.h                |    2 +
 drivers/net/wireless/wl12xx/rx.c                |   36 +-
 drivers/net/wireless/wl12xx/scan.c              |  243 +++
 drivers/net/wireless/wl12xx/scan.h              |  114 +
 drivers/net/wireless/wl12xx/sdio.c              |   64 +-
 drivers/net/wireless/wl12xx/tx.c                |   13 +-
 drivers/net/wireless/wl12xx/tx.h                |    2 +-
 drivers/net/wireless/wl12xx/wl12xx.h            |   14 +-
 drivers/ssb/driver_pcicore.c                    |   26 -
 drivers/ssb/main.c                              |   31 +
 drivers/ssb/scan.c                              |    5 +-
 include/linux/bcma/bcma.h                       |  224 ++
 include/linux/bcma/bcma_driver_chipcommon.h     |  302 +++
 include/linux/bcma/bcma_driver_pci.h            |   89 +
 include/linux/bcma/bcma_regs.h                  |   34 +
 include/linux/ieee80211.h                       |   11 +-
 include/linux/mod_devicetable.h                 |   17 +
 include/linux/nl80211.h                         |  302 +++-
 include/linux/ssb/ssb.h                         |    1 +
 include/net/bluetooth/hci_core.h                |    1 +
 include/net/bluetooth/l2cap.h                   |    9 +-
 include/net/cfg80211.h                          |  237 +++-
 include/net/mac80211.h                          |   66 +
 net/bluetooth/hci_conn.c                        |   17 +
 net/bluetooth/hci_event.c                       |    5 +-
 net/bluetooth/l2cap_core.c                      |  193 ++-
 net/bluetooth/l2cap_sock.c                      |   72 +-
 net/bluetooth/mgmt.c                            |    3 +
 net/bluetooth/rfcomm/core.c                     |    2 +-
 net/mac80211/agg-rx.c                           |    3 +-
 net/mac80211/agg-tx.c                           |   59 +-
 net/mac80211/cfg.c                              |  135 +-
 net/mac80211/debugfs.c                          |    2 +-
 net/mac80211/debugfs_key.c                      |   21 +-
 net/mac80211/driver-ops.h                       |   56 +-
 net/mac80211/driver-trace.h                     |  228 ++-
 net/mac80211/ht.c                               |   27 +-
 net/mac80211/ibss.c                             |   11 +-
 net/mac80211/ieee80211_i.h                      |   41 +-
 net/mac80211/iface.c                            |    3 +-
 net/mac80211/key.c                              |   30 +-
 net/mac80211/key.h                              |    4 +
 net/mac80211/main.c                             |   37 +-
 net/mac80211/mesh.c                             |   47 +-
 net/mac80211/mesh.h                             |    6 +-
 net/mac80211/mesh_hwmp.c                        |   38 +-
 net/mac80211/mesh_pathtbl.c                     |  123 +-
 net/mac80211/mesh_plink.c                       |   83 +-
 net/mac80211/mlme.c                             |   22 +-
 net/mac80211/pm.c                               |   13 +-
 net/mac80211/rc80211_minstrel.c                 |    4 +-
 net/mac80211/rc80211_minstrel_ht.c              |   27 +-
 net/mac80211/rx.c                               |   17 +-
 net/mac80211/scan.c                             |  122 ++-
 net/mac80211/sta_info.c                         |   19 +-
 net/mac80211/sta_info.h                         |   50 +-
 net/mac80211/tx.c                               |   10 +-
 net/mac80211/util.c                             |   19 +
 net/rfkill/core.c                               |    2 +-
 net/wireless/core.c                             |   89 +-
 net/wireless/core.h                             |   33 +
 net/wireless/lib80211_crypt_wep.c               |    3 +-
 net/wireless/mlme.c                             |   10 +
 net/wireless/nl80211.c                          |  670 ++++++-
 net/wireless/nl80211.h                          |    4 +
 net/wireless/reg.c                              |    2 -
 net/wireless/scan.c                             |   77 +-
 net/wireless/sysfs.c                            |    2 +-
 net/wireless/util.c                             |  126 ++-
 scripts/mod/file2alias.c                        |   22 +
 213 files changed, 19165 insertions(+), 2153 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-bus-bcma
 create mode 100644 drivers/bcma/Kconfig
 create mode 100644 drivers/bcma/Makefile
 create mode 100644 drivers/bcma/README
 create mode 100644 drivers/bcma/TODO
 create mode 100644 drivers/bcma/bcma_private.h
 create mode 100644 drivers/bcma/core.c
 create mode 100644 drivers/bcma/driver_chipcommon.c
 create mode 100644 drivers/bcma/driver_chipcommon_pmu.c
 create mode 100644 drivers/bcma/driver_pci.c
 create mode 100644 drivers/bcma/host_pci.c
 create mode 100644 drivers/bcma/main.c
 create mode 100644 drivers/bcma/scan.c
 create mode 100644 drivers/bcma/scan.h
 create mode 100644 drivers/net/wireless/iwlwifi/iwl-sv-open.c
 create mode 100644 drivers/net/wireless/iwlwifi/iwl-testmode.h
 create mode 100644 drivers/net/wireless/rtlwifi/rtl8192se/Makefile
 create mode 100644 drivers/net/wireless/rtlwifi/rtl8192se/def.h
 create mode 100644 drivers/net/wireless/rtlwifi/rtl8192se/dm.c
 create mode 100644 drivers/net/wireless/rtlwifi/rtl8192se/dm.h
 create mode 100644 drivers/net/wireless/rtlwifi/rtl8192se/fw.c
 create mode 100644 drivers/net/wireless/rtlwifi/rtl8192se/fw.h
 create mode 100644 drivers/net/wireless/rtlwifi/rtl8192se/hw.c
 create mode 100644 drivers/net/wireless/rtlwifi/rtl8192se/hw.h
 create mode 100644 drivers/net/wireless/rtlwifi/rtl8192se/led.c
 create mode 100644 drivers/net/wireless/rtlwifi/rtl8192se/led.h
 create mode 100644 drivers/net/wireless/rtlwifi/rtl8192se/phy.c
 create mode 100644 drivers/net/wireless/rtlwifi/rtl8192se/phy.h
 create mode 100644 drivers/net/wireless/rtlwifi/rtl8192se/reg.h
 create mode 100644 drivers/net/wireless/rtlwifi/rtl8192se/rf.c
 create mode 100644 drivers/net/wireless/rtlwifi/rtl8192se/rf.h
 create mode 100644 drivers/net/wireless/rtlwifi/rtl8192se/sw.c
 create mode 100644 drivers/net/wireless/rtlwifi/rtl8192se/sw.h
 create mode 100644 drivers/net/wireless/rtlwifi/rtl8192se/table.c
 create mode 100644 drivers/net/wireless/rtlwifi/rtl8192se/table.h
 create mode 100644 drivers/net/wireless/rtlwifi/rtl8192se/trx.c
 create mode 100644 drivers/net/wireless/rtlwifi/rtl8192se/trx.h
 create mode 100644 include/linux/bcma/bcma.h
 create mode 100644 include/linux/bcma/bcma_driver_chipcommon.h
 create mode 100644 include/linux/bcma/bcma_driver_pci.h
 create mode 100644 include/linux/bcma/bcma_regs.h

Omnibus patch available here:

	http://www.kernel.org/pub/linux/kernel/people/linville/wireless-next-2.6-2011-05-16.patch.bz2

-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

^ permalink raw reply

* Re: [PATCH V5 2/6 net-next] netdevice.h: Add zero-copy flag in netdevice
From: Michael S. Tsirkin @ 2011-05-16 21:14 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Shirley Ma, David Miller, Eric Dumazet, Avi Kivity, Arnd Bergmann,
	netdev, kvm, linux-kernel
In-Reply-To: <1305575253.2885.28.camel@bwh-desktop>

On Mon, May 16, 2011 at 08:47:33PM +0100, Ben Hutchings wrote:
> On Mon, 2011-05-16 at 12:38 -0700, Shirley Ma wrote:
> > On Mon, 2011-05-16 at 20:35 +0100, Ben Hutchings wrote:
> > > Sorry, bit 31 is taken.  You get the job of turning features into a
> > > wider bitmap.
> > 
> > :) will do it.
> 
> Bear in mind that feature masks are manipulated in many different
> places.  This is not a simple task.
> 
> See previous discussion at:
> http://thread.gmane.org/gmane.linux.network/193284
> and especially:
> http://thread.gmane.org/gmane.linux.network/193284/focus=193332
> 
> Ben.


IIUC, what is suggested above is something like:

typedef struct net_features {
} net_features_t;

and then

   void netdev_set_feature(net_features_t *net_features, int feature);
   void netdev_clear_feature(net_features_t *net_features, int feature);
   bool netdev_test_feature(net_features_t *net_features, int feature);


I think this might be the easiest way as compiler will catch any direct uses.
It can then be split up nicely.

It looks a bit different from what Dave suggested but I think it's
close enough?

we could also have wrappers that set/clear/test many features to replace
uses of A|B|C that are pretty common.

   static inline void netdev_set_features(net_features_t *net_features, int nfeatures, int *features)
   {
	int i;
	for (i = 0; i < nfeatures; ++i)
		netdev_set_feature(net_features, features[i]);
   }
   void netdev_clear_features(net_features_t *net_features, int nfeatures, int *features)
   {
	int i;
	for (i = 0; i < nfeatures; ++i)
		netdev_clear_feature(net_features, features[i]);
   }
   bool netdev_test_features(net_features_t *net_features, int nfeatures, int *features)
   {
	int i;
	for (i = 0; i < nfeatures; ++i)
		if (netdev_test_feature(net_features, features[i]))
			return true;
	return false;
   }

and possibly macros that get arrays of constants:

#define NETDEV_SET_FEATURES(net_features, feature_array) do { \
	int __NETDEV_SET_FEATURES_F[] = feature_array;
	netdev_set_feature((net_features), \
		ARRAY_SIZE(__NETDEV_SET_FEATURES_F), __NETDEV_SET_FEATURES_F);
} while (0)

etc.

> -- 
> Ben Hutchings, Senior Software Engineer, Solarflare
> Not speaking for my employer; that's the marketing department's job.
> They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH] ethtool: ETHTOOL_SFEATURES: remove NETIF_F_COMPAT return
From: Ben Hutchings @ 2011-05-16 21:08 UTC (permalink / raw)
  To: Michał Mirosław; +Cc: netdev, David Miller
In-Reply-To: <20110516205137.GA7667@rere.qmqm.pl>

On Mon, 2011-05-16 at 22:51 +0200, Michał Mirosław wrote:
> On Mon, May 16, 2011 at 03:53:17PM +0100, Ben Hutchings wrote:
> > On Mon, 2011-05-16 at 16:23 +0200, Michał Mirosław wrote:
> > > On Mon, May 16, 2011 at 02:37:46PM +0100, Ben Hutchings wrote:
> > > > On Mon, 2011-05-16 at 15:28 +0200, Michał Mirosław wrote:
> > > > > Remove NETIF_F_COMPAT since it's redundant and will be unused after
> > > > > all drivers are converted to fix/set_features.
> > > > > 
> > > > > Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
> > > > > ---
> > > > > 
> > > > > For net as we don't want to have ETHTOOL_F_COMPAT hit stable release.
> > > > [...]
> > > > ETHTOOL_F_WISH means that the requested features could not all be
> > > > enabled, *but are remembered*.  ETHTOOL_F_COMPAT means they were not
> > > > remembered.
> > > Hmm. So, lets just revert 39fc0ce5710c53bad14aaba1a789eec810c556f9
> > > (net: Implement SFEATURES compatibility for not updated drivers).
> > That's also problematic because it means we can't make any use of the
> > 'available' masks from ETHTOOL_GFEATURES.
> > 
> > The patch I sent is actually tested with a modified ethtool.  The
> > fallback works.  I don't think you've tested whether any of your
> > proposals can actually practically be used by ethtool.
> 
> While reading your patches I noted some differences in the way we see
> the new [GS]FEATURES ops.
> 
> First, you make NETIF_F_* flags part of the ethtool ABI. In my approach
> feature names become an ABI instead. That's what ETH_SS_FEATURES string
> set is for, and that's what comments in kernel's <linux/ethtool.h>
> include say.

We've been through this before.  I can't use those names in ethtool
because they aren't the same as ethtool used previously.  I could make
it map strings to strings, but I don't see the point.

> dev->features are exposed directly by kernel only in two ways:
>  1. /sys/class/net/*/features - since NETIF_F_* flags are not exported
>     in headers for userspace, this should be treated like a debugging
>     facility and not an ABI
>  2. ETHTOOL_[GS]FLAGS - these export 5 flags (LRO, VLAN offload, NTuple,
>     and RX hashing) that are renamed to ETH_FLAG_* - only those constants
>     are in the ABI and only in relation with ETHTOOL_[GS]FLAGS
> 
> Second, you reimplement 'ethtool -K' using ETHTOOL_SFEATURES. Does this mean
> that we want to get rid of ETHTOOL_[GS]{FLAGS,SG,...} from kernel?

We must not.

> The
> assumptions in those calls are a bit different from ETHTOOL_[GS]FEATURES
> but there is an conversion layer in kernel that allows old binaries to
> work correctly in the common case. (-EOPNOTSUPP is still returned for
> drivers which can't change particular feature. The difference is seen
> only in that disabling and enabling e.g. checksumming won't disable other
> dependent features in the result.)
> 
> Right now we already agree that NETIF_F_COMPAT should go.
> 
> I'll send my idea of the ethtool code using ETHTOOL_[GS]FEATURES and
> keeping NETIF_F_* flags internal to the kernel. It adds new modes (-w/-W).
> This might be made even more useful by adding simple wildcard matching.

I've explained before that I do not want to add new options to do
(mostly) the same thing.  Users should have not have to use a different
command depending on the kernel version.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Null pointer dereference in icmp_send
From: Aristide Fattori @ 2011-05-16 21:06 UTC (permalink / raw)
  To: netdev; +Cc: roberto.paleari

Hi everybody,

in function icmp_send() (net/ipv4/icmp.c), the parameter passed to
dev_net() function is not properly validated. This can lead to a NULL
pointer dereference that crashes the kernel. The bug can be triggered
remotely, by flooding the target with fragmented IPv4 packets.
Important fields in the IP packet are:
 * Flags: the MF flag must be set.
 * Fragment ID: using pseudo-random values for this field quickly
fills fragmented queues in the victim's kernel, as it is unable to
easily reassemble received packets.
 * TOS: using pseudo-random values for this field triggers the
creation of more than one route cache entry for the same destination
address, increasing the chances of incurring in the error condition
described before.
Other fields of the packet do not really matter, and they can be set
to arbitrary values.

If you are interested, we can provide a small and very dirty python
script that easily triggers the error condition.

Greetings,
Aristide Fattori
Roberto Paleari

-- 
GnuPG Key on keyserver.pgp.com ID 0x25578128
http://security.dico.unimi.it/~joystick/

^ permalink raw reply

* [PATCH net-2.6] net: Keep TX queues stopped as long as the physical device is absent
From: Ben Hutchings @ 2011-05-16 21:04 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

netif_device_detach() stops all TX queues, but there is nothing to
prevent them from being restarted.  In fact, netif_tx_unlock() may now
do this.  Add another queue state flag that is set while the device is
absent, and make netif_tx_queue_frozen_or_stopped() test it.  Rename the
function to netif_tx_queue_blocked() since it makes little sense to keep
adding flags to its name.

This bug appears to have been present forever, but had little effect
before commit c3f26a269c2421f97f10cf8ed05d5099b573af4d ('netdev: Fix
lockdep warnings in multiqueue configurations.').

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Cc: stable@kernel.org
---
 include/linux/netdevice.h |   19 +++++++++++++++----
 net/core/dev.c            |   16 ++++++++++++++++
 net/core/netpoll.c        |    2 +-
 net/core/pktgen.c         |    2 +-
 net/sched/sch_generic.c   |    6 +++---
 net/sched/sch_teql.c      |    2 +-
 6 files changed, 37 insertions(+), 10 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 0249fe7..1727723 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -544,8 +544,10 @@ static inline void napi_synchronize(const struct napi_struct *n)
 enum netdev_queue_state_t {
 	__QUEUE_STATE_XOFF,
 	__QUEUE_STATE_FROZEN,
-#define QUEUE_STATE_XOFF_OR_FROZEN ((1 << __QUEUE_STATE_XOFF)		| \
-				    (1 << __QUEUE_STATE_FROZEN))
+	__QUEUE_STATE_ABSENT,
+#define QUEUE_STATE_BLOCKED ((1 << __QUEUE_STATE_XOFF)		| \
+			     (1 << __QUEUE_STATE_FROZEN)	| \
+			     (1 << __QUEUE_STATE_ABSENT))
 };
 
 struct netdev_queue {
@@ -1897,9 +1899,18 @@ static inline int netif_queue_stopped(const struct net_device *dev)
 	return netif_tx_queue_stopped(netdev_get_tx_queue(dev, 0));
 }
 
-static inline int netif_tx_queue_frozen_or_stopped(const struct netdev_queue *dev_queue)
+/**
+ *	netif_tx_queue_blocked - test if TX queue is blocked for any reason
+ *	@dev_queue: Transmit queue
+ *
+ *	Test whether transmit queue is blocked.  This could happen
+ *	because it was explicitly stopped (usually due to a hardware
+ *	queue filling up), because the device transmit state is locked,
+ *	or because the hardware device was detached.
+ */
+static inline int netif_tx_queue_blocked(const struct netdev_queue *dev_queue)
 {
-	return dev_queue->state & QUEUE_STATE_XOFF_OR_FROZEN;
+	return dev_queue->state & QUEUE_STATE_BLOCKED;
 }
 
 /**
diff --git a/net/core/dev.c b/net/core/dev.c
index 9200944..6b1205a 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1740,6 +1740,14 @@ EXPORT_SYMBOL(dev_kfree_skb_any);
  */
 void netif_device_detach(struct net_device *dev)
 {
+	struct netdev_queue *txq;
+	unsigned int i;
+
+	for (i = 0; i < dev->num_tx_queues; i++) {
+		txq = netdev_get_tx_queue(dev, i);
+		set_bit(__QUEUE_STATE_ABSENT, &txq->state);
+	}
+
 	if (test_and_clear_bit(__LINK_STATE_PRESENT, &dev->state) &&
 	    netif_running(dev)) {
 		netif_tx_stop_all_queues(dev);
@@ -1755,6 +1763,14 @@ EXPORT_SYMBOL(netif_device_detach);
  */
 void netif_device_attach(struct net_device *dev)
 {
+	struct netdev_queue *txq;
+	unsigned int i;
+
+	for (i = 0; i < dev->num_tx_queues; i++) {
+		txq = netdev_get_tx_queue(dev, i);
+		clear_bit(__QUEUE_STATE_ABSENT, &txq->state);
+	}
+
 	if (!test_and_set_bit(__LINK_STATE_PRESENT, &dev->state) &&
 	    netif_running(dev)) {
 		netif_tx_wake_all_queues(dev);
diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 06be243..dac4c2c 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -75,7 +75,7 @@ static void queue_process(struct work_struct *work)
 
 		local_irq_save(flags);
 		__netif_tx_lock(txq, smp_processor_id());
-		if (netif_tx_queue_frozen_or_stopped(txq) ||
+		if (netif_tx_queue_blocked(txq) ||
 		    ops->ndo_start_xmit(skb, dev) != NETDEV_TX_OK) {
 			skb_queue_head(&npinfo->txq, skb);
 			__netif_tx_unlock(txq);
diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index aeeece7..8dcf293 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -3478,7 +3478,7 @@ static void pktgen_xmit(struct pktgen_dev *pkt_dev)
 
 	__netif_tx_lock_bh(txq);
 
-	if (unlikely(netif_tx_queue_frozen_or_stopped(txq))) {
+	if (unlikely(netif_tx_queue_blocked(txq))) {
 		ret = NETDEV_TX_BUSY;
 		pkt_dev->last_ok = 0;
 		goto unlock;
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index c84b659..df6d01b 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -60,7 +60,7 @@ static inline struct sk_buff *dequeue_skb(struct Qdisc *q)
 
 		/* check the reason of requeuing without tx lock first */
 		txq = netdev_get_tx_queue(dev, skb_get_queue_mapping(skb));
-		if (!netif_tx_queue_frozen_or_stopped(txq)) {
+		if (!netif_tx_queue_blocked(txq)) {
 			q->gso_skb = NULL;
 			q->q.qlen--;
 		} else
@@ -121,7 +121,7 @@ int sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
 	spin_unlock(root_lock);
 
 	HARD_TX_LOCK(dev, txq, smp_processor_id());
-	if (!netif_tx_queue_frozen_or_stopped(txq))
+	if (!netif_tx_queue_blocked(txq))
 		ret = dev_hard_start_xmit(skb, dev, txq);
 
 	HARD_TX_UNLOCK(dev, txq);
@@ -143,7 +143,7 @@ int sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
 		ret = dev_requeue_skb(skb, q);
 	}
 
-	if (ret && netif_tx_queue_frozen_or_stopped(txq))
+	if (ret && netif_tx_queue_blocked(txq))
 		ret = 0;
 
 	return ret;
diff --git a/net/sched/sch_teql.c b/net/sched/sch_teql.c
index 45cd300..f876462 100644
--- a/net/sched/sch_teql.c
+++ b/net/sched/sch_teql.c
@@ -312,7 +312,7 @@ restart:
 			if (__netif_tx_trylock(slave_txq)) {
 				unsigned int length = qdisc_pkt_len(skb);
 
-				if (!netif_tx_queue_frozen_or_stopped(slave_txq) &&
+				if (!netif_tx_queue_blocked(slave_txq) &&
 				    slave_ops->ndo_start_xmit(skb, slave) == NETDEV_TX_OK) {
 					txq_trans_update(slave_txq);
 					__netif_tx_unlock(slave_txq);
-- 
1.7.4


-- 
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply related

* Re: [PATCH V5 4/6 net-next] vhost: vhost TX zero-copy support
From: Shirley Ma @ 2011-05-16 20:56 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: David Miller, Eric Dumazet, Avi Kivity, Arnd Bergmann, netdev,
	kvm, linux-kernel
In-Reply-To: <20110516204540.GD18148@redhat.com>

On Mon, 2011-05-16 at 23:45 +0300, Michael S. Tsirkin wrote:
> > +/* Since we need to keep the order of used_idx as avail_idx, it's
> possible that
> > + * DMA done not in order in lower device driver for some reason. To
> prevent
> > + * used_idx out of order, upend_idx is used to track avail_idx
> order, done_idx
> > + * is used to track used_idx order. Once lower device DMA done,
> then upend_idx
> > + * can move to done_idx.
> 
> Could you clarify this please? virtio explicitly allows out of order
> completion of requests. Does it simplify code that we try to keep
> used index updates in-order? Because if not, this is not
> really a requirement.

Hello Mike,

Based on my testing, vhost_add_used() must be in order from
vhost_get_vq_desc(). Otherwise, virtio_net ring seems get double
freed. I didn't spend time on debugging this.

in virtqueue_get_buf

        if (unlikely(!vq->data[i])) {
                BAD_RING(vq, "id %u is not a head!\n", i);
                return NULL;
        }

That's the reason I created the upend_idx and done_idx.

Thanks
Shirley

^ permalink raw reply

* [RFC PATCH ethtool] ethtool: implement G/SFEATURES calls
From: Michał Mirosław @ 2011-05-16 20:54 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: netdev, David Miller
In-Reply-To: <20110516205137.GA7667@rere.qmqm.pl>

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
---
 ethtool.c |  299 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 files changed, 293 insertions(+), 6 deletions(-)

diff --git a/ethtool.c b/ethtool.c
index 34fe107..86a5a8b 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -33,6 +33,7 @@
 #include <limits.h>
 #include <ctype.h>
 
+#include <unistd.h>
 #include <sys/socket.h>
 #include <netinet/in.h>
 #include <arpa/inet.h>
@@ -83,6 +84,9 @@ static int do_gcoalesce(int fd, struct ifreq *ifr);
 static int do_scoalesce(int fd, struct ifreq *ifr);
 static int do_goffload(int fd, struct ifreq *ifr);
 static int do_soffload(int fd, struct ifreq *ifr);
+static void parse_sfeatures_args(int argc, char **argp, int argi);
+static int do_gfeatures(int fd, struct ifreq *ifr);
+static int do_sfeatures(int fd, struct ifreq *ifr);
 static int do_gstats(int fd, struct ifreq *ifr);
 static int rxflow_str_to_type(const char *str);
 static int parse_rxfhashopts(char *optstr, u32 *data);
@@ -119,6 +123,8 @@ static enum {
 	MODE_SRING,
 	MODE_GOFFLOAD,
 	MODE_SOFFLOAD,
+	MODE_GFEATURES,
+	MODE_SFEATURES,
 	MODE_GSTATS,
 	MODE_GNFC,
 	MODE_SNFC,
@@ -197,6 +203,10 @@ static struct option {
 		"		[ ntuple on|off ]\n"
 		"		[ rxhash on|off ]\n"
     },
+    { "-w", "--show-features", MODE_GFEATURES, "Get offload status" },
+    { "-W", "--request-features", MODE_SFEATURES, "Set requested offload",
+		"		[ feature-name on|off [...] ]\n"
+		"		see --show-features output for feature-name strings\n" },
     { "-i", "--driver", MODE_GDRV, "Show driver information" },
     { "-d", "--register-dump", MODE_GREGS, "Do a register dump",
 		"		[ raw on|off ]\n"
@@ -768,6 +778,8 @@ static void parse_cmdline(int argc, char **argp)
 			    (mode == MODE_SRING) ||
 			    (mode == MODE_GOFFLOAD) ||
 			    (mode == MODE_SOFFLOAD) ||
+			    (mode == MODE_GFEATURES) ||
+			    (mode == MODE_SFEATURES) ||
 			    (mode == MODE_GSTATS) ||
 			    (mode == MODE_GNFC) ||
 			    (mode == MODE_SNFC) ||
@@ -858,6 +870,11 @@ static void parse_cmdline(int argc, char **argp)
 				i = argc;
 				break;
 			}
+			if (mode == MODE_SFEATURES) {
+				parse_sfeatures_args(argc, argp, i);
+				i = argc;
+				break;
+			}
 			if (mode == MODE_SCLSRULE) {
 				if (!strcmp(argp[i], "flow-type")) {
 					i += 1;
@@ -1867,21 +1884,30 @@ static int dump_rxfhash(int fhash, u64 val)
 	return 0;
 }
 
-static int doit(void)
+static int get_control_socket(struct ifreq *ifr)
 {
-	struct ifreq ifr;
 	int fd;
 
 	/* Setup our control structures. */
-	memset(&ifr, 0, sizeof(ifr));
-	strcpy(ifr.ifr_name, devname);
+	memset(ifr, 0, sizeof(*ifr));
+	strcpy(ifr->ifr_name, devname);
 
 	/* Open control socket. */
 	fd = socket(AF_INET, SOCK_DGRAM, 0);
-	if (fd < 0) {
+	if (fd < 0)
 		perror("Cannot get control socket");
+
+	return fd;
+}
+
+static int doit(void)
+{
+	struct ifreq ifr;
+	int fd;
+
+	fd = get_control_socket(&ifr);
+	if (fd < 0)
 		return 70;
-	}
 
 	/* all of these are expected to populate ifr->ifr_data as needed */
 	if (mode == MODE_GDRV) {
@@ -1918,6 +1944,10 @@ static int doit(void)
 		return do_goffload(fd, &ifr);
 	} else if (mode == MODE_SOFFLOAD) {
 		return do_soffload(fd, &ifr);
+	} else if (mode == MODE_GFEATURES) {
+		return do_gfeatures(fd, &ifr);
+	} else if (mode == MODE_SFEATURES) {
+		return do_sfeatures(fd, &ifr);
 	} else if (mode == MODE_GSTATS) {
 		return do_gstats(fd, &ifr);
 	} else if (mode == MODE_GNFC) {
@@ -2355,6 +2385,263 @@ static int do_soffload(int fd, struct ifreq *ifr)
 	return 0;
 }
 
+static int get_feature_strings(int fd, struct ifreq *ifr,
+	struct ethtool_gstrings **strs)
+{
+	struct ethtool_sset_info *sset_info;
+	struct ethtool_gstrings *strings;
+	int sz_str, n_strings, err;
+
+	sset_info = malloc(sizeof(struct ethtool_sset_info) + sizeof(u32));
+	sset_info->cmd = ETHTOOL_GSSET_INFO;
+	sset_info->sset_mask = (1ULL << ETH_SS_FEATURES);
+	ifr->ifr_data = (caddr_t)sset_info;
+	err = send_ioctl(fd, ifr);
+
+	if ((err < 0) ||
+	    (!(sset_info->sset_mask & (1ULL << ETH_SS_FEATURES)))) {
+		perror("Cannot get driver strings info");
+		return -100;
+	}
+
+	n_strings = sset_info->data[0];
+	free(sset_info);
+	sz_str = n_strings * ETH_GSTRING_LEN;
+
+	strings = calloc(1, sz_str + sizeof(struct ethtool_gstrings));
+	if (!strings) {
+		fprintf(stderr, "no memory available\n");
+		return -95;
+	}
+
+	strings->cmd = ETHTOOL_GSTRINGS;
+	strings->string_set = ETH_SS_FEATURES;
+	strings->len = n_strings;
+	ifr->ifr_data = (caddr_t) strings;
+	err = send_ioctl(fd, ifr);
+	if (err < 0) {
+		perror("Cannot get feature strings information");
+		free(strings);
+		return -96;
+	}
+
+	*strs = strings;
+	return n_strings;
+}
+
+struct ethtool_sfeatures *features_req;
+
+static void parse_sfeatures_args(int argc, char **argp, int argi)
+{
+	struct cmdline_info *cmdline_desc, *cp;
+	struct ethtool_gstrings *strings;
+	struct ifreq ifr;
+	int n_strings, sz_features, i;
+	int fd, changed = 0;
+
+	fd = get_control_socket(&ifr);
+	if (fd < 0)
+		exit(100);
+
+	n_strings = get_feature_strings(fd, &ifr, &strings);
+	if (n_strings < 0)
+		exit(-n_strings);
+
+	sz_features = sizeof(*features_req->features) * ((n_strings + 31) / 32);
+
+	cp = cmdline_desc = calloc(n_strings, sizeof(*cmdline_desc));
+	features_req = calloc(1, sizeof(*features_req) + sz_features);
+	if (!cmdline_desc || !features_req) {
+		fprintf(stderr, "no memory available\n");
+		exit(95);
+	}
+
+	features_req->size = (n_strings + 31) / 32;
+
+	for (i = 0; i < n_strings; ++i) {
+		if (!strings->data[i*ETH_GSTRING_LEN])
+			continue;
+
+		strings->data[i*ETH_GSTRING_LEN + ETH_GSTRING_LEN-1] = 0;
+		cp->name = (const char *)strings->data + i * ETH_GSTRING_LEN;
+		cp->type = CMDL_FLAG;
+		cp->flag_val = 1 << (i % 32);
+		cp->wanted_val = &features_req->features[i / 32].requested;
+		cp->seen_val = &features_req->features[i / 32].valid;
+		++cp;
+	}
+
+	parse_generic_cmdline(argc, argp, argi, &changed,
+		cmdline_desc, cp - cmdline_desc);
+
+	free(cmdline_desc);
+	free(strings);
+	close(fd);
+
+	if (!changed) {
+		free(features_req);
+		features_req = NULL;
+	}
+}
+
+static int send_gfeatures(int fd, struct ifreq *ifr, int n_strings,
+	struct ethtool_gfeatures **features_p)
+{
+	struct ethtool_gfeatures *features;
+	int err, sz_features;
+
+	sz_features = sizeof(*features->features) * ((n_strings + 31) / 32);
+	features = calloc(1, sz_features + sizeof(*features));
+	if (!features) {
+		fprintf(stderr, "no memory available\n");
+		return 95;
+	}
+
+	features->cmd = ETHTOOL_GFEATURES;
+	features->size = (n_strings + 31) / 32;
+	ifr->ifr_data = (caddr_t) features;
+	err = send_ioctl(fd, ifr);
+
+	if (err < 0) {
+		perror("Cannot get feature status");
+		free(features);
+		return 97;
+	}
+
+	*features_p = features;
+	return 0;
+}
+
+static int do_gfeatures(int fd, struct ifreq *ifr)
+{
+	struct ethtool_gstrings *strings;
+	struct ethtool_gfeatures *features;
+	int n_strings, err, i;
+
+	n_strings = get_feature_strings(fd, ifr, &strings);
+	if (n_strings < 0)
+		return -n_strings;
+
+	err = send_gfeatures(fd, ifr, n_strings, &features);
+	if (err) {
+		free(strings);
+		return err;
+	}
+
+	fprintf(stdout, "Offload state:  (name: enabled,wanted,changable)\n");
+	for (i = 0; i < n_strings; i++) {
+		if (!strings->data[i * ETH_GSTRING_LEN])
+			continue;	/* empty */
+#define P_FLAG(f) \
+	(features->features[i / 32].f & (1 << (i % 32))) ? "yes" : " no"
+#define PA_FLAG(f) \
+	(features->features[i / 32].available & (1 << (i % 32))) ? P_FLAG(f) : "---"
+#define PN_FLAG(f) \
+	(features->features[i / 32].never_changed & (1 << (i % 32))) ? "---" : P_FLAG(f)
+		fprintf(stdout, "     %-*.*s %s,%s,%s\n",
+			ETH_GSTRING_LEN, ETH_GSTRING_LEN,
+			&strings->data[i * ETH_GSTRING_LEN],
+			P_FLAG(active), PA_FLAG(requested), PN_FLAG(available));
+#undef P_FLAG
+#undef PA_FLAG
+#undef PN_FLAG
+	}
+	free(strings);
+	free(features);
+
+	return 0;
+}
+
+static void print_gfeatures_diff(
+	const struct ethtool_get_features_block *expected,
+	const struct ethtool_get_features_block *set,
+	const char *strings, int n_strings)
+{
+	int i;
+
+	if (n_strings > 32)
+		n_strings = 32;
+
+	for (i = 0; i < n_strings; ++i) {
+		const char *name = &strings[i * ETH_GSTRING_LEN];
+		u32 mask = 1 << i;
+
+		if (!((expected->active ^ set->active) & mask))
+			continue;
+
+		fprintf(stderr, "feature %.*s is %s (expected: %s, saved: %s)\n",
+			ETH_GSTRING_LEN, name,
+			set->active & mask ? "enabled" : "disabled",
+			expected->active & mask ? "enabled" : "disabled",
+			!(set->available & mask) ? "not user-changeable" :
+				set->requested & mask ? "enabled" : "disabled"
+		);
+	}
+}
+
+static int do_sfeatures(int fd, struct ifreq *ifr)
+{
+	struct ethtool_gstrings *strings;
+	struct ethtool_gfeatures *features0, *features1;
+	int n_strings, err, i;
+
+	if (!features_req) {
+		fprintf(stderr, "no features changed\n");
+		return 97;
+	}
+
+	n_strings = get_feature_strings(fd, ifr, &strings);
+	if (n_strings < 0) {
+		free(features_req);
+		return -n_strings;
+	}
+
+	err = send_gfeatures(fd, ifr, n_strings, &features0);
+	if (err) {
+		perror("Cannot read features");
+		goto free_strings;
+	}
+
+	features_req->cmd = ETHTOOL_SFEATURES;
+	ifr->ifr_data = (caddr_t) features_req;
+	err = send_ioctl(fd, ifr);
+	if (err < 0) {
+		perror("Cannot change features");
+		err = 97;
+		goto free_features;
+	}
+
+	err = send_gfeatures(fd, ifr, n_strings, &features1);
+	if (err) {
+		perror("Cannot verify features");
+		goto free_features;
+	}
+
+	/* make features0 .active what we expect to be set */
+	i = (n_strings + 31) / 32;
+	while (i--) {
+		features0->features[i].active &= ~features_req->features[i].valid;
+		features0->features[i].active |=
+			features_req->features[i].requested &
+			features_req->features[i].valid;
+	}
+
+	for (i = 0; i < n_strings; i += 32)
+		print_gfeatures_diff(&features0->features[i / 32],
+			&features1->features[i / 32],
+			(char *)&strings->data[i * ETH_GSTRING_LEN],
+			n_strings - i);
+
+	free(features1);
+free_features:
+	free(features0);
+free_strings:
+	free(strings);
+	free(features_req);
+	return err;
+}
+
+
 static int do_gset(int fd, struct ifreq *ifr)
 {
 	int err;
-- 
1.7.2.5


^ permalink raw reply related

* Re: [PATCH] ethtool: ETHTOOL_SFEATURES: remove NETIF_F_COMPAT return
From: Michał Mirosław @ 2011-05-16 20:51 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: netdev, David Miller
In-Reply-To: <1305557597.2885.5.camel@bwh-desktop>

On Mon, May 16, 2011 at 03:53:17PM +0100, Ben Hutchings wrote:
> On Mon, 2011-05-16 at 16:23 +0200, Michał Mirosław wrote:
> > On Mon, May 16, 2011 at 02:37:46PM +0100, Ben Hutchings wrote:
> > > On Mon, 2011-05-16 at 15:28 +0200, Michał Mirosław wrote:
> > > > Remove NETIF_F_COMPAT since it's redundant and will be unused after
> > > > all drivers are converted to fix/set_features.
> > > > 
> > > > Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
> > > > ---
> > > > 
> > > > For net as we don't want to have ETHTOOL_F_COMPAT hit stable release.
> > > [...]
> > > ETHTOOL_F_WISH means that the requested features could not all be
> > > enabled, *but are remembered*.  ETHTOOL_F_COMPAT means they were not
> > > remembered.
> > Hmm. So, lets just revert 39fc0ce5710c53bad14aaba1a789eec810c556f9
> > (net: Implement SFEATURES compatibility for not updated drivers).
> That's also problematic because it means we can't make any use of the
> 'available' masks from ETHTOOL_GFEATURES.
> 
> The patch I sent is actually tested with a modified ethtool.  The
> fallback works.  I don't think you've tested whether any of your
> proposals can actually practically be used by ethtool.

While reading your patches I noted some differences in the way we see
the new [GS]FEATURES ops.

First, you make NETIF_F_* flags part of the ethtool ABI. In my approach
feature names become an ABI instead. That's what ETH_SS_FEATURES string
set is for, and that's what comments in kernel's <linux/ethtool.h>
include say.

dev->features are exposed directly by kernel only in two ways:
 1. /sys/class/net/*/features - since NETIF_F_* flags are not exported
    in headers for userspace, this should be treated like a debugging
    facility and not an ABI
 2. ETHTOOL_[GS]FLAGS - these export 5 flags (LRO, VLAN offload, NTuple,
    and RX hashing) that are renamed to ETH_FLAG_* - only those constants
    are in the ABI and only in relation with ETHTOOL_[GS]FLAGS

Second, you reimplement 'ethtool -K' using ETHTOOL_SFEATURES. Does this mean
that we want to get rid of ETHTOOL_[GS]{FLAGS,SG,...} from kernel? The
assumptions in those calls are a bit different from ETHTOOL_[GS]FEATURES
but there is an conversion layer in kernel that allows old binaries to
work correctly in the common case. (-EOPNOTSUPP is still returned for
drivers which can't change particular feature. The difference is seen
only in that disabling and enabling e.g. checksumming won't disable other
dependent features in the result.)

Right now we already agree that NETIF_F_COMPAT should go.

I'll send my idea of the ethtool code using ETHTOOL_[GS]FEATURES and
keeping NETIF_F_* flags internal to the kernel. It adds new modes (-w/-W).
This might be made even more useful by adding simple wildcard matching.

Best Regards,
Michał Mirosław

^ permalink raw reply

* Re: kernel bug relating to networking
From: James @ 2011-05-16 20:47 UTC (permalink / raw)
  Cc: Kernel Mailing List, netdev
In-Reply-To: <BANLkTikF3ntwWjV3iuwwCnJhZKJMVHHygQ@mail.gmail.com>

On 05/16/11 11:47, Daniel Baluta wrote:
> On Mon, May 16, 2011 at 6:35 PM, James <bjlockie@lockie.ca> wrote:
>> I originally posted to linux-net@vger.kernel.org but all that list
> This should be netdev@vger.kernel.org.
That is confusing since the welcome message said: "welcome message for
linux-net@vger.kernel.org".

Both lists are listed at http://vger.kernel.org/vger-lists.html but it
doesn't say the difference.

^ permalink raw reply

* Re: [PATCH V5 4/6 net-next] vhost: vhost TX zero-copy support
From: Michael S. Tsirkin @ 2011-05-16 20:45 UTC (permalink / raw)
  To: Shirley Ma
  Cc: David Miller, Eric Dumazet, Avi Kivity, Arnd Bergmann, netdev,
	kvm, linux-kernel
In-Reply-To: <1305574484.3456.30.camel@localhost.localdomain>

> +/* Since we need to keep the order of used_idx as avail_idx, it's possible that
> + * DMA done not in order in lower device driver for some reason. To prevent
> + * used_idx out of order, upend_idx is used to track avail_idx order, done_idx
> + * is used to track used_idx order. Once lower device DMA done, then upend_idx
> + * can move to done_idx.

Could you clarify this please? virtio explicitly allows out of order
completion of requests. Does it simplify code that we try to keep
used index updates in-order? Because if not, this is not
really a requirement.

-- 
MST

^ permalink raw reply

* Re: [PATCH 1/1] igmp: fix ip_mc_clear_src to not reset ip_mc_list->sf{mode,count}
From: David Stevens @ 2011-05-16 20:42 UTC (permalink / raw)
  To: David Miller
  Cc: jmorris, kaber, kuznet, linux-kbuild, linux-kernel, mmarek,
	netdev, pekkas, vfalico, yoshfuji
In-Reply-To: <20110516.140359.111037536766782557.davem@davemloft.net>

> From: Veaceslav Falico <vfalico@redhat.com>
> Date: Sun, 15 May 2011 18:59:45 +0200
> 
> > ip_mc_clear_src resets the imc->sfcount and imc->sfmode, without 
taking into
> > account the current number of sockets listening on that multicast 
> struct, which
> > can lead to bogus routes for local listeners.
> > 
> > On NETDEV_DOWN/UP event, if there were 3 multicast listeners for 
> that interface's
> > address, the imc->sfcount[MCAST_EXCLUDE] will be reset to 1. And 
> after that a
> > listener socket destroys, multicast traffic will not be delivered to 
local
> > listeners because __mkroute_output drops the local flag for the route 
(by
> > checking ip_check_mc).

        On NETDEV_DOWN, all group memberships are dropped. 
ip_mc_clear_src()
is simply freeing all the source filters and turning it into an "EXCLUDE 
nobody"
membership (ie, the same as an ordinary join without source filtering). 
This
ordinarily happens when you are deleting the group entirely (when the 
reference
count goes to 0), but is also called on device down.
        This patch is not appropriate; when the groups are deleted, the 
source
filters are deleted, and the filter counts have to reflect the source 
filters
in the list. If you had an "INCLUDE A" filter, for example, that would 
become
an "INCLUDE nobody" filter and drop all traffic (from A or not). The 
number
of source filters is not related to the number of listener sockets, and 
the
function of ip_mc_clear_src() is to make it 0 (with the special case of 1 
for
EXCLUDE), so setting the counts has to be done for proper functioning.
        I don't quite understand the problem you're trying to solve here 
--
when the device comes back up, the group should be re-added with 
{EXCLUDE,nobody} and
ip_check_mc() should therefore return 1. Of course, while the interface is
down, the mc_list is empty and it'd return 0 in that case.
        Do you have a small test program to demonstrate the problem?

        For the patch, I have to say NACK.

                                                                +-DLS



^ permalink raw reply

* Re: [PATCH] net: Change netdev_fix_features messages loglevel
From: Michael S. Tsirkin @ 2011-05-16 20:37 UTC (permalink / raw)
  To: David Miller; +Cc: mirq-linux, netdev, herbert, bhutchings
In-Reply-To: <20110516.151434.829498612745581899.davem@davemloft.net>

On Mon, May 16, 2011 at 03:14:34PM -0400, David Miller wrote:
> From: Michał Mirosław <mirq-linux@rere.qmqm.pl>
> Date: Mon, 16 May 2011 15:17:57 +0200 (CEST)
> 
> > Those reduced to DEBUG can possibly be triggered by unprivileged processes
> > and are nothing exceptional. Illegal checksum combinations can only be
> > caused by driver bug, so promote those messages to WARN.
> > 
> > Since GSO without SG will now only cause DEBUG message from
> > netdev_fix_features(), remove the workaround from register_netdevice().
> > 
> > Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
> 
> Applied, thanks.

Cool, how about we make 'Features changed' debug as well?
This way userspace can't fill up the log just by tweaking tun features
with an ioctl.

Untested, but you get the message.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

---

diff --git a/net/core/dev.c b/net/core/dev.c
index 3ed09f8..538a1fe 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5270,7 +5270,7 @@ int __netdev_update_features(struct net_device *dev)
 	if (dev->features == features)
 		return 0;
 
-	netdev_info(dev, "Features changed: 0x%08x -> 0x%08x\n",
+	netdev_dbg(dev, "Features changed: 0x%08x -> 0x%08x\n",
 		dev->features, features);
 
 	if (dev->netdev_ops->ndo_set_features)
-- 
MST

^ permalink raw reply related

* [GIT] Networking
From: David Miller @ 2011-05-16 20:29 UTC (permalink / raw)
  To: torvalds; +Cc: akpm, netdev, linux-kernel


1) SFC crashes on 'ethtool -d', from Ben Hutchings.

2) VMXNET3 driver doesn't set LRO correctly on probe, this hits a lot
   of VMWare users.  Fix from Thomas Jarosch.

3) IPVS needs to use correct SEQ release interface, otherwise it leaks
   netns references.  Fix from Hans Schillstrom.

4) Users can (via tun/tap for example) trigger the warnings generated
   by the netdevice feature validation checks we make.  Make such
   warnings of DEBUG level so user's can spam the logs.  Fix from
   Michał Mirosław.

There is a dup of bridge netfilter fix (already in your tree) with
commit ID cb68552858c64db302771469b1202ea09e696329 in here because
Pablo applied it to the netfilter tree as well (and in his tree it
appears as commit d8083deb4f1aa0977980dfb834fcc336ef38318f).  So when
I merged his tree to get the IPVS namespace leak fix, I got the bridge
netfilter fix dup too.

Please pull, thanks a lot!

The following changes since commit df8d06ade6eed9077f658ac8696fc1cb5c081220:

  Merge branch 'omap-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6 (2011-05-16 08:55:49 -0700)

are available in the git repository at:

  master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6.git master

Ben Hutchings (1):
      sfc: Fix oops in register dump after mapping change

David S. Miller (1):
      Merge branch 'pablo/nf-2.6-updates' of git://1984.lsi.us.es/net-2.6

Hans Schillstrom (1):
      IPVS: fix netns if reading ip_vs_* procfs entries

Michał Mirosław (1):
      net: Change netdev_fix_features messages loglevel

Stephen Hemminger (1):
      bridge: fix forwarding of IPv6

Thomas Jarosch (1):
      vmxnet3: Fix inconsistent LRO state after initialization

 drivers/net/sfc/nic.c                 |    7 +++++++
 drivers/net/vmxnet3/vmxnet3_ethtool.c |    3 +++
 net/core/dev.c                        |   22 ++++++++--------------
 net/netfilter/ipvs/ip_vs_app.c        |    2 +-
 net/netfilter/ipvs/ip_vs_conn.c       |    4 ++--
 net/netfilter/ipvs/ip_vs_ctl.c        |    6 +++---
 6 files changed, 24 insertions(+), 20 deletions(-)

^ permalink raw reply

* Re: several packets in a single buffer in Rx
From: Ben Hutchings @ 2011-05-16 20:15 UTC (permalink / raw)
  To: Tomas Winkler
  Cc: Emmanuel Grumbach, Michał Mirosław,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA, Johannes Berg, Guy, Wey-Yi,
	guy.cohen-ral2JQCrhuEAvxtiuMwx3w
In-Reply-To: <BANLkTimvCVcQX-7teQgG7EQV9eeLgG3JnQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Mon, 2011-05-16 at 23:04 +0300, Tomas Winkler wrote:
> 2011/5/16 Emmanuel Grumbach <egrumbach-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>:
> > 2011/5/16 Michał Mirosław <mirqus-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>:
> >> W dniu 16 maja 2011 14:59 użytkownik Emmanuel Grumbach
> >> <egrumbach-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> napisał:
> >>> 2011/5/16 Michał Mirosław <mirqus-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>:
> >>>> 2011/5/16 Emmanuel Grumbach <egrumbach-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>:
> >>>>> I would like to be able to deliver the same page several times to the
> >>>>> stack without having the stack consume it before the last time I
> >>>>> deliver it.
> >>>>> Of course I would like to avoid cloning it.
> >>>>
> >>>> Just do get_page() on the page having another packet in it before
> >>>> passing skb up.
> >>>>
> >>>
> >>> I can see the path:
> >>> __kfree_skb -> skb_release_all -> skb_release_data -> put_page
> >>> put_page will free the page iff the _count variable reaches 0. Of course,
> >>> _count is incremented by get_page.
> >>>
> >>> I will give it try.
> >>>
> >>> I understand that this will work regardless the order given to
> >>> alloc_pages right ?
> >>
> >> Yes. Remember that if you put a lot of packets in a big-order page
> >> then the memory will be freed only after all packets are freed.
> >
> > Sure. Thanks for the help.
> 
> How it is ensured that skb manipulation won't corrupt another packet
> on the same page?

Fragments attached to an skb are always treated as read-only.  The
headers will be copied into the skb's header buffer and may be modified
there.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: several packets in a single buffer in Rx
From: Tomas Winkler @ 2011-05-16 20:04 UTC (permalink / raw)
  To: Emmanuel Grumbach
  Cc: Michał Mirosław, netdev, linux-wireless, Johannes Berg,
	Guy, Wey-Yi, guy.cohen
In-Reply-To: <BANLkTin_stn+7Ja3hbuaOgtrEVJEoT9Lhw@mail.gmail.com>

2011/5/16 Emmanuel Grumbach <egrumbach@gmail.com>:
> 2011/5/16 Michał Mirosław <mirqus@gmail.com>:
>> W dniu 16 maja 2011 14:59 użytkownik Emmanuel Grumbach
>> <egrumbach@gmail.com> napisał:
>>> 2011/5/16 Michał Mirosław <mirqus@gmail.com>:
>>>> 2011/5/16 Emmanuel Grumbach <egrumbach@gmail.com>:
>>>>> I would like to be able to deliver the same page several times to the
>>>>> stack without having the stack consume it before the last time I
>>>>> deliver it.
>>>>> Of course I would like to avoid cloning it.
>>>>
>>>> Just do get_page() on the page having another packet in it before
>>>> passing skb up.
>>>>
>>>
>>> I can see the path:
>>> __kfree_skb -> skb_release_all -> skb_release_data -> put_page
>>> put_page will free the page iff the _count variable reaches 0. Of course,
>>> _count is incremented by get_page.
>>>
>>> I will give it try.
>>>
>>> I understand that this will work regardless the order given to
>>> alloc_pages right ?
>>
>> Yes. Remember that if you put a lot of packets in a big-order page
>> then the memory will be freed only after all packets are freed.
>
> Sure. Thanks for the help.

How it is ensured that skb manipulation won't corrupt another packet
on the same page?

Thanks
Tomas

^ permalink raw reply

* Re: [PATCH V5 2/6 net-next] netdevice.h: Add zero-copy flag in netdevice
From: Ben Hutchings @ 2011-05-16 19:47 UTC (permalink / raw)
  To: Shirley Ma
  Cc: David Miller, mst, Eric Dumazet, Avi Kivity, Arnd Bergmann,
	netdev, kvm, linux-kernel
In-Reply-To: <1305574680.3456.33.camel@localhost.localdomain>

On Mon, 2011-05-16 at 12:38 -0700, Shirley Ma wrote:
> On Mon, 2011-05-16 at 20:35 +0100, Ben Hutchings wrote:
> > Sorry, bit 31 is taken.  You get the job of turning features into a
> > wider bitmap.
> 
> :) will do it.

Bear in mind that feature masks are manipulated in many different
places.  This is not a simple task.

See previous discussion at:
http://thread.gmane.org/gmane.linux.network/193284
and especially:
http://thread.gmane.org/gmane.linux.network/193284/focus=193332

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* [PATCH V5 6/6 net-next] example: enable zero-copy support in ixgbe
From: Shirley Ma @ 2011-05-16 19:42 UTC (permalink / raw)
  To: David Miller, mst, Eric Dumazet, Avi Kivity, Arnd Bergmann
  Cc: netdev, kvm, linux-kernel

Device can enable zero-copy flag when HIGHDMA is supported.

Signed-off-by: Shirley Ma <xma@us.ibm.com>
---

 drivers/net/ixgbe/ixgbe_main.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c
index 2dce3d0..7e9e881 100644
--- a/drivers/net/ixgbe/ixgbe_main.c
+++ b/drivers/net/ixgbe/ixgbe_main.c
@@ -7553,6 +7553,10 @@ static int __devinit ixgbe_probe(struct pci_dev *pdev,
 		netdev->vlan_features |= NETIF_F_HIGHDMA;
 	}
 
+	/* enable zero-copy when device supports HIGHDMA */
+	if (netdev->features & NETIF_F_HIGHDMA)
+		netdev->features |= NETIF_F_ZEROCOPY;
+
 	if (adapter->flags2 & IXGBE_FLAG2_RSC_ENABLED)
 		netdev->features |= NETIF_F_LRO;
 

^ permalink raw reply related

* Re: linux-next: Tree for May 16 (net/ipv4/ping)
From: David Miller @ 2011-05-16 19:38 UTC (permalink / raw)
  To: randy.dunlap; +Cc: sfr, netdev, linux-next, linux-kernel, segoon
In-Reply-To: <20110516123534.5d3a51b9.randy.dunlap@oracle.com>

From: Randy Dunlap <randy.dunlap@oracle.com>
Date: Mon, 16 May 2011 12:35:34 -0700

> On Mon, 16 May 2011 15:10:19 +1000 Stephen Rothwell wrote:
> 
>> Hi all,
>> 
>> Changes since 20110513:
> 
> 
> when CONFIG_PROC_SYSCTL is not enabled:
> 
> ping.c:(.text+0x52af3): undefined reference to `inet_get_ping_group_range_net'

Vasiliy, please fix this.

^ permalink raw reply

* Re: [PATCH V5 2/6 net-next] netdevice.h: Add zero-copy flag in netdevice
From: Shirley Ma @ 2011-05-16 19:38 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: David Miller, mst, Eric Dumazet, Avi Kivity, Arnd Bergmann,
	netdev, kvm, linux-kernel
In-Reply-To: <1305574518.2885.25.camel@bwh-desktop>

On Mon, 2011-05-16 at 20:35 +0100, Ben Hutchings wrote:
> Sorry, bit 31 is taken.  You get the job of turning features into a
> wider bitmap.

:) will do it.

Thanks
Shirley

^ permalink raw reply

* [PATCH V5 5/6 net-next] macvtap: macvtap TX zero-copy support
From: Shirley Ma @ 2011-05-16 19:36 UTC (permalink / raw)
  To: David Miller, mst, Eric Dumazet, Avi Kivity, Arnd Bergmann
  Cc: netdev, kvm, linux-kernel

Only when buffer size is greater than GOODCOPY_LEN (256), macvtap
enables zero-copy.

Signed-off-by: Shirley Ma <xma@us.ibm.com>
---

 drivers/net/macvtap.c |  129 ++++++++++++++++++++++++++++++++++++++++++++----
 1 files changed, 118 insertions(+), 11 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 6696e56..145e51c 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -60,6 +60,7 @@ static struct proto macvtap_proto = {
  */
 static dev_t macvtap_major;
 #define MACVTAP_NUM_DEVS 65536
+#define GOODCOPY_LEN 256
 static struct class *macvtap_class;
 static struct cdev macvtap_cdev;
 
@@ -340,6 +341,7 @@ static int macvtap_open(struct inode *inode, struct file *file)
 {
 	struct net *net = current->nsproxy->net_ns;
 	struct net_device *dev = dev_get_by_index(net, iminor(inode));
+	struct macvlan_dev *vlan = netdev_priv(dev);
 	struct macvtap_queue *q;
 	int err;
 
@@ -369,6 +371,15 @@ static int macvtap_open(struct inode *inode, struct file *file)
 	q->flags = IFF_VNET_HDR | IFF_NO_PI | IFF_TAP;
 	q->vnet_hdr_sz = sizeof(struct virtio_net_hdr);
 
+	/*
+	 * so far only KVM virtio_net uses macvtap, enable zero copy between
+	 * guest kernel and host kernel when lower device supports zerocopy
+	 */
+	if (vlan) {
+		if (vlan->lowerdev->features & NETIF_F_ZEROCOPY)
+			sock_set_flag(&q->sk, SOCK_ZEROCOPY);
+	}
+
 	err = macvtap_set_queue(dev, file, q);
 	if (err)
 		sock_put(&q->sk);
@@ -433,6 +444,80 @@ static inline struct sk_buff *macvtap_alloc_skb(struct sock *sk, size_t prepad,
 	return skb;
 }
 
+/* set skb frags from iovec, this can move to core network code for reuse */
+static int zerocopy_sg_from_iovec(struct sk_buff *skb, const struct iovec *from,
+				  int offset, size_t count)
+{
+	int len = iov_length(from, count) - offset;
+	int copy = skb_headlen(skb);
+	int size, offset1 = 0;
+	int i = 0;
+	skb_frag_t *f;
+
+	/* Skip over from offset */
+	while (count && (offset >= from->iov_len)) {
+		offset -= from->iov_len;
+		++from;
+		--count;
+	}
+
+	/* copy up to skb headlen */
+	while (count && (copy > 0)) {
+		size = min_t(unsigned int, copy, from->iov_len - offset);
+		if (copy_from_user(skb->data + offset1, from->iov_base + offset,
+				   size))
+			return -EFAULT;
+		if (copy > size) {
+			++from;
+			--count;
+		}
+		copy -= size;
+		offset1 += size;
+		offset = 0;
+	}
+
+	if (len == offset1)
+		return 0;
+
+	while (count--) {
+		struct page *page[MAX_SKB_FRAGS];
+		int num_pages;
+		unsigned long base;
+
+		len = from->iov_len - offset1;
+		if (!len) {
+			offset1 = 0;
+			++from;
+			continue;
+		}
+		base = (unsigned long)from->iov_base + offset1;
+		size = ((base & ~PAGE_MASK) + len + ~PAGE_MASK) >> PAGE_SHIFT;
+		num_pages = get_user_pages_fast(base, size, 0, &page[i]);
+		if ((num_pages != size) ||
+		    (num_pages > MAX_SKB_FRAGS - skb_shinfo(skb)->nr_frags))
+			/* put_page is in skb free */
+			return -EFAULT;
+		skb->data_len += len;
+		skb->len += len;
+		skb->truesize += len;
+		atomic_add(len, &skb->sk->sk_wmem_alloc);
+		while (len) {
+			f = &skb_shinfo(skb)->frags[i];
+			f->page = page[i];
+			f->page_offset = base & ~PAGE_MASK;
+			f->size = min_t(int, len, PAGE_SIZE - f->page_offset);
+			skb_shinfo(skb)->nr_frags++;
+			/* increase sk_wmem_alloc */
+			base += f->size;
+			len -= f->size;
+			i++;
+		}
+		offset1 = 0;
+		++from;
+	}
+	return 0;
+}
+
 /*
  * macvtap_skb_from_vnet_hdr and macvtap_skb_to_vnet_hdr should
  * be shared with the tun/tap driver.
@@ -515,16 +600,18 @@ static int macvtap_skb_to_vnet_hdr(const struct sk_buff *skb,
 
 
 /* Get packet from user space buffer */
-static ssize_t macvtap_get_user(struct macvtap_queue *q,
-				const struct iovec *iv, size_t count,
-				int noblock)
+static ssize_t macvtap_get_user(struct macvtap_queue *q, struct msghdr *m,
+				const struct iovec *iv, unsigned long total_len,
+				size_t count, int noblock)
 {
 	struct sk_buff *skb;
 	struct macvlan_dev *vlan;
-	size_t len = count;
+	unsigned long len = total_len;
 	int err;
 	struct virtio_net_hdr vnet_hdr = { 0 };
 	int vnet_hdr_len = 0;
+	int copylen;
+	bool zerocopy = false;
 
 	if (q->flags & IFF_VNET_HDR) {
 		vnet_hdr_len = q->vnet_hdr_sz;
@@ -552,12 +639,32 @@ static ssize_t macvtap_get_user(struct macvtap_queue *q,
 	if (unlikely(len < ETH_HLEN))
 		goto err;
 
-	skb = macvtap_alloc_skb(&q->sk, NET_IP_ALIGN, len, vnet_hdr.hdr_len,
-				noblock, &err);
+	if (m && m->msg_control)
+		zerocopy = true;
+
+	if (zerocopy) {
+		/* There are 256 bytes to be copied in skb, so there is enough
+		 * room for skb expand head in case it is used.
+		 * The rest buffer is mapped from userspace.
+		 */
+		copylen = vnet_hdr.hdr_len;
+		if (!copylen)
+			copylen = GOODCOPY_LEN;
+	} else
+		copylen = len;
+
+	skb = macvtap_alloc_skb(&q->sk, NET_IP_ALIGN, copylen,
+				vnet_hdr.hdr_len, noblock, &err);
 	if (!skb)
 		goto err;
 
-	err = skb_copy_datagram_from_iovec(skb, 0, iv, vnet_hdr_len, len);
+	if (zerocopy) {
+		err = zerocopy_sg_from_iovec(skb, iv, vnet_hdr_len, count);
+		memcpy(&skb_shinfo(skb)->ubuf, m->msg_control,
+		       sizeof(struct skb_ubuf_info));
+	} else
+		err = skb_copy_datagram_from_iovec(skb, 0, iv, vnet_hdr_len,
+						   len);
 	if (err)
 		goto err_kfree;
 
@@ -579,7 +686,7 @@ static ssize_t macvtap_get_user(struct macvtap_queue *q,
 		kfree_skb(skb);
 	rcu_read_unlock_bh();
 
-	return count;
+	return total_len;
 
 err_kfree:
 	kfree_skb(skb);
@@ -601,8 +708,8 @@ static ssize_t macvtap_aio_write(struct kiocb *iocb, const struct iovec *iv,
 	ssize_t result = -ENOLINK;
 	struct macvtap_queue *q = file->private_data;
 
-	result = macvtap_get_user(q, iv, iov_length(iv, count),
-			      file->f_flags & O_NONBLOCK);
+	result = macvtap_get_user(q, NULL, iv, iov_length(iv, count), count,
+				  file->f_flags & O_NONBLOCK);
 	return result;
 }
 
@@ -815,7 +922,7 @@ static int macvtap_sendmsg(struct kiocb *iocb, struct socket *sock,
 			   struct msghdr *m, size_t total_len)
 {
 	struct macvtap_queue *q = container_of(sock, struct macvtap_queue, sock);
-	return macvtap_get_user(q, m->msg_iov, total_len,
+	return macvtap_get_user(q, m, m->msg_iov, total_len, m->msg_iovlen,
 			    m->msg_flags & MSG_DONTWAIT);
 }
 

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox