Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [patch net-next] virtio_net: allow to change mac when iface is running
From: Jiri Pirko @ 2012-06-28  6:35 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, virtualization, brouer, mst
In-Reply-To: <20120627.213046.1244710404799995026.davem@davemloft.net>

Thu, Jun 28, 2012 at 06:30:46AM CEST, davem@davemloft.net wrote:
>From: Jiri Pirko <jpirko@redhat.com>
>Date: Wed, 27 Jun 2012 17:27:46 +0200
>
>> Signed-off-by: Jiri Pirko <jpirko@redhat.com>
>
>Applied, but this seriously makes eth_mac_addr() completely useless.
>
>Technically, every eth_mac_addr() user in a software/virtual device
>should behave the way virtio_net does now.

I guess to. But for some HW devices eth_mac_addr() is needed (when they
does not support "life" mac change")

>
>It therefore probably makes sense to add a boolean arg which when true
>elides the netif_running() check then fixup and audit every caller.

I was thinking about this. Maybe probably __eth_mac_addr() which does
not have netif_running() check and eth_mac_addr() calling
netif_running() check and __eth_mac_addr() after that.

What do you think?

Jirka

^ permalink raw reply

* Re: [net-next RFC V3 PATCH 4/6] tuntap: multiqueue support
From: Jason Wang @ 2012-06-28  5:31 UTC (permalink / raw)
  To: Sridhar Samudrala
  Cc: Michael S. Tsirkin, habanero, netdev, linux-kernel, krkumar2,
	tahm, akong, davem, shemminger, mashirle
In-Reply-To: <4FEBE2FF.3010105@us.ibm.com>

On 06/28/2012 12:52 PM, Sridhar Samudrala wrote:
> On 6/27/2012 8:02 PM, Jason Wang wrote:
>> On 06/27/2012 04:44 PM, Michael S. Tsirkin wrote:
>>> On Wed, Jun 27, 2012 at 01:16:30PM +0800, Jason Wang wrote:
>>>> On 06/26/2012 06:42 PM, Michael S. Tsirkin wrote:
>>>>> On Tue, Jun 26, 2012 at 11:42:17AM +0800, Jason Wang wrote:
>>>>>> On 06/25/2012 04:25 PM, Michael S. Tsirkin wrote:
>>>>>>> On Mon, Jun 25, 2012 at 02:10:18PM +0800, Jason Wang wrote:
>>>>>>>> This patch adds multiqueue support for tap device. This is done 
>>>>>>>> by abstracting
>>>>>>>> each queue as a file/socket and allowing multiple sockets to be 
>>>>>>>> attached to the
>>>>>>>> tuntap device (an array of tun_file were stored in the 
>>>>>>>> tun_struct). Userspace
>>>>>>>> could write and read from those files to do the parallel packet
>>>>>>>> sending/receiving.
>>>>>>>>
>>>>>>>> Unlike the previous single queue implementation, the socket and 
>>>>>>>> device were
>>>>>>>> loosely coupled, each of them were allowed to go away first. In 
>>>>>>>> order to let the
>>>>>>>> tx path lockless, netif_tx_loch_bh() is replaced by 
>>>>>>>> RCU/NETIF_F_LLTX to
>>>>>>>> synchronize between data path and system call.
>>>>>>> Don't use LLTX/RCU. It's not worth it.
>>>>>>> Use something like netif_set_real_num_tx_queues.
>>>>>>>
>>>>>>>> The tx queue selecting is first based on the recorded rxq index 
>>>>>>>> of an skb, it
>>>>>>>> there's no such one, then choosing based on rx hashing 
>>>>>>>> (skb_get_rxhash()).
>>>>>>>>
>>>>>>>> Signed-off-by: Jason Wang<jasowang@redhat.com>
>>>>>>> Interestingly macvtap switched to hashing first:
>>>>>>> ef0002b577b52941fb147128f30bd1ecfdd3ff6d
>>>>>>> (the commit log is corrupted but see what it
>>>>>>> does in the patch).
>>>>>>> Any idea why?
>>>>>> Yes, so tap should be changed to behave same as macvtap. I remember
>>>>>> the reason we do that is to make sure the packet of a single flow to
>>>>>> be queued to a fixed socket/virtqueues. As 10g cards like ixgbe
>>>>>> choose the rx queue for a flow based on the last tx queue where the
>>>>>> packets of that flow comes. So if we are using recored rx queue in
>>>>>> macvtap, the queue index of a flow would change as vhost thread
>>>>>> moves amongs processors.
>>>>> Hmm. OTOH if you override this, if TX is sent from VCPU0, RX might 
>>>>> land
>>>>> on VCPU1 in the guest, which is not good, right?
>>>> Yes, but better than making the rx moves between vcpus when we use
>>>> recorded rx queue.
>>> Why isn't this a problem with native TCP?
>>> I think what happens is one of the following:
>>> - moving between CPUs is more expensive with tun
>>>    because it can queue so much data on xmit
>>> - scheduler makes very bad decisions about VCPUs
>>>    bouncing them around all the time
>>
>> For usual native TCP/host process, as it reads and writes tcp 
>> sockets, so it make make sense to move rx to the porcessor where the 
>> process moves. But vhost does not do tcp stuffs and ixgbe would still 
>> move rx when vhost process moves, and we can't even make sure the 
>> vhost process that handling rx is running on processor that handle rx 
>> interrupt.
>
> We also saw this behavior with the default ixgbe configuration. If 
> vhost is pinned to a CPU all
> packets for that VM are received on a single RX queue.
> So even if the VM is doing multiple TCP_RR sessions, packets for all 
> the flows are received
> on a single RX queue. Without pinning, vhost moves around and so does 
> the packets across
> the RX queues.
>
> I think
>         ethtool -K ethX ntuple on
> will disable this behavior and it should be possible to program the 
> flow director using ethtool -U.
> This way we can split the packets across the host NIC RX queues based 
> on the flows, but it is not
> clear if this would help with the current model of single vhost per 
> device.
> With per-cpu vhost,  each RX queue can be handled by the matching 
> vhost, but if we have only
> 1 queue in the VMs virtio-net device, that could become the bottleneck.

Yes, I've been thinking about this. And instead of using ethtool -U 
(maybe possible for macvtap but hard for tuntap), we can 'teach' the 
ixgbe of the rxq it would used for a flow because ixgbe_select_queue() 
would first select the txq based on the recorded rxq. So if we want the 
flow using a dedicated rxq say N, we can record N to the rxq in tuntap 
before we passing the skb to bridge.

> Multi-queue virtio-net should help here, but we need the same number 
> of queues in VM's virtio-net
> device as the host's NIC so that each vhost can handle the 
> corresponding virtio queue.
> But if the VM has only 2 vcpus, i think it is not efficient to have 8 
> virtio-net queues.(to match a host
> with 8 physical cpus and 8 RX queues in the NIC).

Ideally, if we can 2 queues in guest, it's better to only use 2 queues 
in host to avoid extra contention.
>
> Thanks
> Sridhar
>
>>
>>> Could we isolate which it is? Does the problem
>>> still happen if you pin VCPUs to host cpus?
>>> If not it's the queue depth.
>>
>> It may not help as tun does not record the vcpu/queue that send the 
>> stream, so it can't transmit the packets back the same vcpu/queue.
>>>> Flow steering is needed to make sure the tx and
>>>> rx on the same vcpu.
>>> That involves IPI between processes, so it might be
>>> very expensive for kvm.
>>>
>>>>>> But during test tun/tap, one interesting thing I find is that even
>>>>>> ixgbe has recorded the queue index during rx, it seems be lost when
>>>>>> tap tries to transmit skbs to userspace.
>>>>> dev_pick_tx does this I think but ndo_select_queue
>>>>> should be able to get it without trouble.
>>>>>
>>>>>
>

^ permalink raw reply

* Re: [PATCH net-next] ipv4: tcp: dont cache unconfirmed intput dst
From: David Miller @ 2012-06-28  5:22 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, hans.schillstrom
In-Reply-To: <1340860399.26242.206.camel@edumazet-glaptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 28 Jun 2012 07:13:19 +0200

> On Thu, 2012-06-28 at 07:08 +0200, Eric Dumazet wrote:
> 
>> The initial idea was to perform this only for SYN packets received on a
>> listener in SYNCOOKIE mode. I'll resend the patch when fully
>> implemented, instead of a forward patch.
>> 
> 
> s/forward/followup/
> 
> ;)

Ok :-)

^ permalink raw reply

* [GIT] Networking
From: David Miller @ 2012-06-28  5:21 UTC (permalink / raw)
  To: torvalds; +Cc: akpm, netdev, linux-kernel


1) Pairing and deadlock fixes in bluetooth from Johan Hedberg.

2) Add device IDs for AR3011 and AR3012 bluetooth chips.  From
   Giancarlo Formicuccia and Marek Vasut.

3) Fix wireless regulatory deadlock, from Eliad Peller.

4) Fix full TX ring panic in bnx2x driver, from Eric Dumazet.

5) Revert the two commits that added skb_orphan_try(), it causes
   erratic bonding behavior with UDP clients and the gains it
   used to give are mostly no longer happening due to how BQL
   works.  From Eric Dumazet.

6) It took two tries, but Thomas Graf fixed a problem wherein we
   registered ipv6 routing procfs files before their backend data
   were initialized properly.

7) Fix max GSO size setting in be2net, from Sarveshwar Bandi.

8) PHY device id mask is wrong for KSZ9021 and KS8001 chips, fix
   from Jason Wang.

9) Fix use of stale SKB data pointer after skb_linearize() call in	
   batman-adv, from Antonio Quartulli.

10) Fix memory leak in IXGBE due to missing __GFP_COMP, from Alexander
    Duyck.

11) Fix probing of Gobi devices in qmi_wwan usbnet driver, from Bjørn Mork.

12) Fix suspend/resume and open failure handling in usbnet from Ming
    Lei.

13) Attempt to fix device r8169 hangs for certain chips, from Francois
    Romieu.

14) Fix advancement of RX dirty pointer in some situations in sh_eth
    driver, from Yoshihiro Shimoda.

15) Attempt to fix restart of IPV6 routing table dumps when there is
    an intervening table update.  From Eric Dumazet.

16) Respect security_inet_conn_request() return value in ipv6 TCP.  From
    Neal Cardwell.

17) Add another iPAD device ID to ipheth driver, from Davide Gerhard.

18) Fix access to freed SKB in l2tp_eth_dev_xmit(), and fix l2tp lockdep
    splats, from Eric Dumazet.

19) Make sure all bridge devices, regardless of whether they were created
    via netlink or ioctls, have their rtnetlink ops hooked up.  From
    Thomas Graf and Stephen Hemminger.

Please pull, thanks a lot!

The following changes since commit 424d54d2dca03805942055e5b19926d33a7d1e31:

  Merge git://git.kernel.org/pub/scm/virt/kvm/kvm (2012-06-14 15:46:59 +0300)

are available in the git repository at:


  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net master

for you to fetch changes up to a969dd139cc2f2bccdcb11894f0695517cf84d4d:

  Merge branch 'for-davem' of git://gitorious.org/linux-can/linux-can (2012-06-27 15:27:24 -0700)

----------------------------------------------------------------

Alexander Duyck (2):
      ixgbe: Fix memory leak in ixgbe when receiving traffic on DDP enabled rings
      ixgbe: Do not pad FCoE frames as this can cause issues with FCoE DDP

Amerigo Wang (1):
      bonding: show all the link status of slaves

Andrei Emeltchenko (1):
      Bluetooth: btmrvl: Do not send vendor events to bluetooth stack

Antonio Quartulli (2):
      batman-adv: fix skb->data assignment
      batman-adv: fix race condition in TT full-table replacement

Ashok Nagarajan (1):
      mac80211: add missing kernel-doc

Avinash Patil (2):
      mwifiex: fix incorrect privacy setting in beacon and probe response
      mwifiex: fix uAP TX packet timeout issue

Bing Zhao (1):
      mwifiex: fix wrong return values in add_virtual_intf() error cases

Bjørn Mork (1):
      net: qmi_wwan: fix Gobi device probing

Bob Copeland (1):
      ath5k: remove _bh from inner locks

Carolyn Wyborny (2):
      igb: Fix incorrect RAR address entries for i210/i211 device.
      Kconfig: Fix Kconfig for Intel ixgbe and igb PTP support.

Dan Carpenter (4):
      can: c_can: precedence error in c_can_chip_config()
      qlcnic: off by one in qlcnic_init_pci_info()
      airo: copying wrong data in airo_get_aplist()
      9p: fix min_t() casting in p9pdu_vwritef()

Daniel Halperin (1):
      sctp: fix warning when compiling without IPv6

David S. Miller (2):
      Revert "ipv6: Prevent access to uninitialized fib_table_hash via /proc/net/ipv6_route"
      Merge branch 'for-davem' of git://gitorious.org/linux-can/linux-can

David Spinadel (1):
      mac80211: stop polling in disassociation

Davide Gerhard (1):
      ipheth: add support for iPad

Eliad Peller (2):
      cfg80211: fix potential deadlock in regulatory
      mac80211: check sdata_running on ieee80211_set_bitrate_mask

Eric Dumazet (5):
      bnx2x: fix panic when TX ring is full
      net: remove skb_orphan_try()
      ipv6: fib: fix fib dump restart
      net: l2tp_eth: fix l2tp_eth_dev_xmit race
      net: l2tp_eth: use LLTX to avoid LOCKDEP splats

Felix Fietkau (2):
      ath9k: fix a tx rate duration calculation bug
      ath9k: fix invalid pointer access in the tx path

Giancarlo Formicuccia (1):
      Bluetooth: add support for atheros 0930:0219

Grazvydas Ignotas (3):
      wl1251: fix TSF calculation
      wl1251: always report beacon loss to the stack
      wl1251: Fix memory leaks in SPI initialization

Hui Wang (1):
      can: flexcan: use be32_to_cpup to handle the value of dt entry

Ian Campbell (1):
      xen/netfront: teardown the device before unregistering it.

Jacob Keller (1):
      ixgbe: Fix PHC loophole allowing misconfiguration of increment register

Jason Wang (1):
      phy/micrel: change phy_id_mask for KSZ9021 and KS8001

Jens Freimann (1):
      vhost: use USER_DS in vhost_worker thread

Johan Hedberg (4):
      Bluetooth: Fix SMP pairing method selection
      Bluetooth: Fix deadlock and crash when SMP pairing times out
      Bluetooth: Fix SMP security elevation from medium to high
      Bluetooth: Add support for encryption key refresh

Johannes Berg (2):
      mac80211: add some missing kernel-doc
      iwlwifi: remove log_event debugfs file debugging is disabled

John W. Linville (5):
      Merge branch 'for-upstream' of git://git.kernel.org/.../bluetooth/bluetooth
      Merge branch 'for-john' of git://git.kernel.org/.../jberg/mac80211
      Merge branch 'master' of git://git.kernel.org/.../linville/wireless into for-davem
      Merge branch 'for-upstream' of git://git.kernel.org/.../bluetooth/bluetooth
      Merge branch 'master' of git://git.kernel.org/.../linville/wireless into for-davem

Jussi Kivilinna (1):
      rndis_wlan: fix matching bssid check in rndis_check_bssid_list()

Marek Lindner (1):
      batman-adv: only drop packets of known wifi clients

Marek Vasut (1):
      Bluetooth: Support AR3011 in Acer Iconia Tab W500

Michal Kazior (1):
      cfg80211: check iface combinations only when iface is running

Ming Lei (3):
      usbnet: clear OPEN flag in failure path
      usbnet: decrease suspend count if returning -EBUSY for runtime suspend
      usbnet: handle remote wakeup asap

Mohammed Shafi Shajakhan (4):
      ath9k: Fix a WARNING on suspend/resume with IBSS
      ath9k: remove incompatible IBSS interface check in change_iface
      ath9k: Fix softlockup in AR9485
      ath9k_hw: avoid possible infinite loop in ar9003_get_pll_sqsum_dvc

Neal Cardwell (1):
      tcp: heed result of security_inet_conn_request() in tcp_v6_conn_request()

Per Ellefsen (1):
      caif-hsi: Bugfix - Piggyback'ed embedded CAIF frame lost

Phil Sutter (1):
      usbnet: sanitise overlong driver information strings

Rajkumar Manoharan (1):
      ath9k_htc: configure bssid on ASSOC/IBSS change

Rémi Denis-Courmont (1):
      net: remove my future former mail address

Sarveshwar Bandi (1):
      be2net: reduce gso_max_size setting to account for ethernet header.

Sjur Brændeland (2):
      caif: Clear shutdown mask to zero at reconnect.
      caif-hsi: Add missing return in error path

Szymon Janc (1):
      Bluetooth: Fix using uninitialized option in RFCMode

Thomas Graf (2):
      ipv6: Prevent access to uninitialized fib_table_hash via /proc/net/ipv6_route
      ipv6: Move ipv6 proc file registration to end of init order

Vasundhara Volam (2):
      be2net: Modify error message to incorporate subsystem
      be2net: Increase statistics structure size for skyhawk.

Vishal Agarwal (2):
      Bluetooth: Fix LE pairing completion on connection failure
      Bluetooth: Fix sending HCI_Disconnect only when connected

Yevgeny Petrilin (3):
      net/mlx4_en: Set correct port parameters during device initialization
      net/mlx4: Use single completion vector after NOP failure
      net/mlx4_en: Release QP range in free_resources

Yoshihiro Shimoda (1):
      net: sh_eth: fix the condition to fix the cur_tx/dirty_rx

Yuval Mintz (2):
      bnx2x: fix I2C non-respondent issue
      bnx2x: fix link for BCM57711 with 84823 phy

alex.bluesman.smirnov@gmail.com (1):
      mac802154: add missed braces

françois romieu (1):
      r8169: RxConfig hack for the 8168evl.

stephen hemminger (1):
      bridge: Assign rtnl_link_ops to bridge devices created via ioctl (v2)

 drivers/bluetooth/ath3k.c                        |    3 ++
 drivers/bluetooth/btmrvl_drv.h                   |    2 +-
 drivers/bluetooth/btmrvl_main.c                  |   14 +++++++--
 drivers/bluetooth/btmrvl_sdio.c                  |    8 +++--
 drivers/bluetooth/btusb.c                        |    2 ++
 drivers/net/bonding/bond_procfs.c                |   15 +++++++--
 drivers/net/caif/caif_hsi.c                      |    5 +--
 drivers/net/can/c_can/c_can.c                    |    4 +--
 drivers/net/can/flexcan.c                        |    4 +--
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c  |    8 ++---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c |   54 ++++++++++++++++++--------------
 drivers/net/ethernet/emulex/benet/be_cmds.c      |   12 +++----
 drivers/net/ethernet/emulex/benet/be_cmds.h      |    2 +-
 drivers/net/ethernet/emulex/benet/be_main.c      |    2 +-
 drivers/net/ethernet/intel/Kconfig               |   10 ++++--
 drivers/net/ethernet/intel/igb/e1000_82575.c     |    2 --
 drivers/net/ethernet/intel/ixgbe/ixgbe.h         |    4 +--
 drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c     |    2 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    |   16 +++++++---
 drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c     |   13 ++++++--
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c   |   18 +++++++----
 drivers/net/ethernet/mellanox/mlx4/main.c        |    2 ++
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h     |    1 +
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c |    2 +-
 drivers/net/ethernet/realtek/r8169.c             |    1 +
 drivers/net/ethernet/renesas/sh_eth.c            |   12 ++++---
 drivers/net/phy/micrel.c                         |    8 ++---
 drivers/net/usb/ipheth.c                         |    5 +++
 drivers/net/usb/qmi_wwan.c                       |   83 ++++++++++++++++++++++++-------------------------
 drivers/net/usb/usbnet.c                         |   53 +++++++++++++++++++------------
 drivers/net/wireless/airo.c                      |    4 +--
 drivers/net/wireless/ath/ath5k/base.c            |    4 +--
 drivers/net/wireless/ath/ath9k/ath9k.h           |    1 +
 drivers/net/wireless/ath/ath9k/htc_drv_main.c    |    5 ++-
 drivers/net/wireless/ath/ath9k/hw.c              |   14 ++++++++-
 drivers/net/wireless/ath/ath9k/main.c            |   27 ++++++----------
 drivers/net/wireless/ath/ath9k/xmit.c            |   31 ++++++++++--------
 drivers/net/wireless/iwlwifi/iwl-debugfs.c       |    6 ++++
 drivers/net/wireless/mwifiex/cfg80211.c          |   25 +++++++--------
 drivers/net/wireless/mwifiex/txrx.c              |   10 ++----
 drivers/net/wireless/mwifiex/uap_cmd.c           |   11 +++++++
 drivers/net/wireless/rndis_wlan.c                |    2 +-
 drivers/net/wireless/ti/wl1251/acx.c             |    2 +-
 drivers/net/wireless/ti/wl1251/event.c           |    3 +-
 drivers/net/wireless/ti/wl1251/spi.c             |    4 +++
 drivers/net/xen-netfront.c                       |    8 ++---
 drivers/vhost/vhost.c                            |    3 ++
 include/linux/skbuff.h                           |    7 ++---
 include/net/bluetooth/hci.h                      |    6 ++++
 include/net/mac80211.h                           |    6 ++++
 include/net/phonet/gprs.h                        |    2 +-
 net/9p/protocol.c                                |    2 +-
 net/batman-adv/routing.c                         |    2 ++
 net/batman-adv/translation-table.c               |   12 +++----
 net/bluetooth/hci_event.c                        |   48 ++++++++++++++++++++++++++++
 net/bluetooth/l2cap_core.c                       |   21 ++++++++-----
 net/bluetooth/mgmt.c                             |   20 +++++++++++-
 net/bluetooth/smp.c                              |   11 ++++---
 net/bridge/br_if.c                               |    1 +
 net/bridge/br_netlink.c                          |    2 +-
 net/bridge/br_private.h                          |    1 +
 net/caif/caif_dev.c                              |    3 +-
 net/caif/caif_socket.c                           |    1 +
 net/can/raw.c                                    |    3 --
 net/core/dev.c                                   |   23 +-------------
 net/ipv6/ip6_fib.c                               |    4 +--
 net/ipv6/route.c                                 |   41 ++++++++++++++++++------
 net/ipv6/tcp_ipv6.c                              |    3 +-
 net/iucv/af_iucv.c                               |    1 -
 net/l2tp/l2tp_eth.c                              |   45 ++++++++++++++++++++-------
 net/mac80211/cfg.c                               |    3 ++
 net/mac80211/mlme.c                              |    4 +--
 net/mac80211/sta_info.h                          |    5 +++
 net/mac802154/tx.c                               |    3 +-
 net/phonet/af_phonet.c                           |    4 +--
 net/phonet/datagram.c                            |    4 +--
 net/phonet/pep-gprs.c                            |    2 +-
 net/phonet/pep.c                                 |    2 +-
 net/phonet/pn_dev.c                              |    4 +--
 net/phonet/pn_netlink.c                          |    4 +--
 net/phonet/socket.c                              |    4 +--
 net/phonet/sysctl.c                              |    2 +-
 net/sctp/protocol.c                              |    2 ++
 net/wireless/reg.c                               |    2 +-
 net/wireless/util.c                              |    2 +-
 85 files changed, 529 insertions(+), 310 deletions(-)

^ permalink raw reply

* Re: [PATCH net-next] ipv4: tcp: dont cache unconfirmed intput dst
From: Eric Dumazet @ 2012-06-28  5:13 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, hans.schillstrom
In-Reply-To: <1340860102.26242.203.camel@edumazet-glaptop>

On Thu, 2012-06-28 at 07:08 +0200, Eric Dumazet wrote:

> The initial idea was to perform this only for SYN packets received on a
> listener in SYNCOOKIE mode. I'll resend the patch when fully
> implemented, instead of a forward patch.
> 

s/forward/followup/

;)

^ permalink raw reply

* Re: [PATCH] Build fix in drivers/net/wireless/ath/ath9k/main.c
From: Mohammed Shafi @ 2012-06-28  5:09 UTC (permalink / raw)
  To: Arvydas Sidorenko
  Cc: mcgrof-A+ZNKFmMK5xy9aJCnZT0Uw, jouni-A+ZNKFmMK5xy9aJCnZT0Uw,
	vthiagar-A+ZNKFmMK5xy9aJCnZT0Uw, senthilb-A+ZNKFmMK5xy9aJCnZT0Uw,
	linville-2XuSBdqkA4R54TAoqtyWWQ,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	ath9k-devel-xDcbHBWguxHbcTqmT+pZeQ, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1340824779-5157-1-git-send-email-asido4-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

Hi,

On Thu, Jun 28, 2012 at 12:49 AM, Arvydas Sidorenko <asido4-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Commit fad29cd2f59949581050a937786c2c9bc78b2f04 broke the build if
> no CONFIG_ATH9K_BTCOEX_SUPPORT is enabled.
>
> Signed-off-by: Arvydas Sidorenko <asido4-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> ---
>  drivers/net/wireless/ath/ath9k/main.c |    2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/net/wireless/ath/ath9k/main.c b/drivers/net/wireless/ath/ath9k/main.c
> index c14cf5a..e4e73f0 100644
> --- a/drivers/net/wireless/ath/ath9k/main.c
> +++ b/drivers/net/wireless/ath/ath9k/main.c
> @@ -151,8 +151,10 @@ static void __ath_cancel_work(struct ath_softc *sc)
>        cancel_delayed_work_sync(&sc->tx_complete_work);
>        cancel_delayed_work_sync(&sc->hw_pll_work);
>
> +#ifdef CONFIG_ATH9K_BTCOEX_SUPPORT
>        if (ath9k_hw_mci_is_enabled(sc->sc_ah))
>                cancel_work_sync(&sc->mci_work);
> +#endif
>  }
>
>  static void ath_cancel_work(struct ath_softc *sc)
> --
> 1.7.8.6

thanks for the patch, but it was just sent some time back
http://www.spinics.net/lists/linux-wireless/msg93078.html


>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
thanks,
shafi
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net-next] ipv4: tcp: dont cache unconfirmed intput dst
From: Eric Dumazet @ 2012-06-28  5:08 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, hans.schillstrom
In-Reply-To: <20120627.170830.811332455348620174.davem@davemloft.net>

On Wed, 2012-06-27 at 17:08 -0700, David Miller wrote:
> From: David Miller <davem@davemloft.net>
> Date: Wed, 27 Jun 2012 17:01:01 -0700 (PDT)
> 
> > There are quite a number of unwanted side effects from this change, so
> > I think we'll have to revert unless you can fix up all of the relevant
> > cases quickly.
> 
> Actually I've decided to revert it now.
> 
> Whilst this was a swell idea, there is no way for you to know if
> we should really create a cached route or not.
> 
> Even if you could, there is a lot of logic you'll need to code up
> so that, f.e., once we determine that we've got a DST_NOCACHE route
> when we move to established state, we can insert it into the routing
> cache and not mark it DST_NOCACHE any longer.
> 
> But even if we did that, we're going to eat 2 uncached route lookups
> for every new incoming legitimate connection.

The initial idea was to perform this only for SYN packets received on a
listener in SYNCOOKIE mode. I'll resend the patch when fully
implemented, instead of a forward patch.

Thanks

^ permalink raw reply

* [PATCH] ipv4: Kill early demux method return value.
From: David Miller @ 2012-06-28  5:07 UTC (permalink / raw)
  To: netdev


It's completely unnecessary.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/protocol.h |    2 +-
 include/net/tcp.h      |    2 +-
 net/ipv4/ip_input.c    |   42 +++++++++++++++++++-----------------------
 net/ipv4/tcp_ipv4.c    |   19 ++++++-------------
 4 files changed, 27 insertions(+), 38 deletions(-)

diff --git a/include/net/protocol.h b/include/net/protocol.h
index 967b926..057f2d3 100644
--- a/include/net/protocol.h
+++ b/include/net/protocol.h
@@ -37,7 +37,7 @@
 
 /* This is used to register protocols. */
 struct net_protocol {
-	int			(*early_demux)(struct sk_buff *skb);
+	void			(*early_demux)(struct sk_buff *skb);
 	int			(*handler)(struct sk_buff *skb);
 	void			(*err_handler)(struct sk_buff *skb, u32 info);
 	int			(*gso_send_check)(struct sk_buff *skb);
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 6660ffc..53fb7d8 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -325,7 +325,7 @@ extern void tcp_v4_err(struct sk_buff *skb, u32);
 
 extern void tcp_shutdown (struct sock *sk, int how);
 
-extern int tcp_v4_early_demux(struct sk_buff *skb);
+extern void tcp_v4_early_demux(struct sk_buff *skb);
 extern int tcp_v4_rcv(struct sk_buff *skb);
 
 extern struct inet_peer *tcp_v4_get_peer(struct sock *sk);
diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c
index 2a39204..b27d444 100644
--- a/net/ipv4/ip_input.c
+++ b/net/ipv4/ip_input.c
@@ -320,33 +320,29 @@ static int ip_rcv_finish(struct sk_buff *skb)
 	const struct iphdr *iph = ip_hdr(skb);
 	struct rtable *rt;
 
+	if (sysctl_ip_early_demux && !skb_dst(skb)) {
+		const struct net_protocol *ipprot;
+		int protocol = iph->protocol;
+
+		rcu_read_lock();
+		ipprot = rcu_dereference(inet_protos[protocol]);
+		if (ipprot && ipprot->early_demux)
+			ipprot->early_demux(skb);
+		rcu_read_unlock();
+	}
+
 	/*
 	 *	Initialise the virtual path cache for the packet. It describes
 	 *	how the packet travels inside Linux networking.
 	 */
-	if (skb_dst(skb) == NULL) {
-		int err = -ENOENT;
-
-		if (sysctl_ip_early_demux) {
-			const struct net_protocol *ipprot;
-			int protocol = iph->protocol;
-
-			rcu_read_lock();
-			ipprot = rcu_dereference(inet_protos[protocol]);
-			if (ipprot && ipprot->early_demux)
-				err = ipprot->early_demux(skb);
-			rcu_read_unlock();
-		}
-
-		if (err) {
-			err = ip_route_input_noref(skb, iph->daddr, iph->saddr,
-						   iph->tos, skb->dev);
-			if (unlikely(err)) {
-				if (err == -EXDEV)
-					NET_INC_STATS_BH(dev_net(skb->dev),
-							 LINUX_MIB_IPRPFILTER);
-				goto drop;
-			}
+	if (!skb_dst(skb)) {
+		int err = ip_route_input_noref(skb, iph->daddr, iph->saddr,
+					       iph->tos, skb->dev);
+		if (unlikely(err)) {
+			if (err == -EXDEV)
+				NET_INC_STATS_BH(dev_net(skb->dev),
+						 LINUX_MIB_IPRPFILTER);
+			goto drop;
 		}
 	}
 
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 1781dc6..b4ae1c1 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1673,30 +1673,28 @@ csum_err:
 }
 EXPORT_SYMBOL(tcp_v4_do_rcv);
 
-int tcp_v4_early_demux(struct sk_buff *skb)
+void tcp_v4_early_demux(struct sk_buff *skb)
 {
 	struct net *net = dev_net(skb->dev);
 	const struct iphdr *iph;
 	const struct tcphdr *th;
 	struct net_device *dev;
 	struct sock *sk;
-	int err;
 
-	err = -ENOENT;
 	if (skb->pkt_type != PACKET_HOST)
-		goto out_err;
+		return;
 
 	if (!pskb_may_pull(skb, ip_hdrlen(skb) + sizeof(struct tcphdr)))
-		goto out_err;
+		return;
 
 	iph = ip_hdr(skb);
 	th = (struct tcphdr *) ((char *)iph + ip_hdrlen(skb));
 
 	if (th->doff < sizeof(struct tcphdr) / 4)
-		goto out_err;
+		return;
 
 	if (!pskb_may_pull(skb, ip_hdrlen(skb) + th->doff * 4))
-		goto out_err;
+		return;
 
 	dev = skb->dev;
 	sk = __inet_lookup_established(net, &tcp_hashinfo,
@@ -1713,16 +1711,11 @@ int tcp_v4_early_demux(struct sk_buff *skb)
 			if (dst) {
 				struct rtable *rt = (struct rtable *) dst;
 
-				if (rt->rt_iif == dev->ifindex) {
+				if (rt->rt_iif == dev->ifindex)
 					skb_dst_set_noref(skb, dst);
-					err = 0;
-				}
 			}
 		}
 	}
-
-out_err:
-	return err;
 }
 
 /*
-- 
1.7.10

^ permalink raw reply related

* Re: [PATCH v2] l2tp: use per-cpu variables for u64_stats updates
From: Eric Dumazet @ 2012-06-28  5:00 UTC (permalink / raw)
  To: Rick Jones
  Cc: Ben Greear, Stephen Hemminger, Tom Parkin, netdev, David.Laight,
	James Chapman
In-Reply-To: <4FEB90C3.9050607@hp.com>

On Wed, 2012-06-27 at 16:01 -0700, Rick Jones wrote:

> Today, sure, generalizing to packet counters in general, that bloat is 
> likely on its way.  At 100 Gbit/s Ethernet, that is upwards of 147 
> million packets per second each way.  At 1 GbE it is 125 million octets 
> per second.  So, if 32 bit octet counters were insufficient for 1 GbE, 
> 32 bit packet counters likely will be insufficient for 100GbE.
> 
> Or, I suppose, 3 or more bonded 40 GbEs or 10 or more bonded 10 GbEs 
> (unlikely though that last one may be) assuming there is stats 
> aggregation in the bond interface.

Note that I am all for 64bit counters on 64bit kernels because they are
almost[1] free, since they fit in a machine word (unsigned long).

tx_dropped is the count of dropped _packets_.

If more than 32bits are needed, and someone must run this 100GbE on a
32bit machine of the last century, he really has a big problem.

[1] : LLTX drivers case 
   since ndo_start_xmit() can be run concurrently by many cpus, safely
updating an "unsigned long" requires additional hassle :

   1) Use of a spinlock to protect the update.
   2) Use atomic_long_t instead of "unsigned long"
   3) Use percpu data

3) is overkill for devices with light traffic, because it consumes lot
of RAM on machines with 2048 possible cpus, _and_ the reader must fold
the data of all possible values.

^ permalink raw reply

* [PATCH] xfrm_user: Propagate netlink error codes properly.
From: David Miller @ 2012-06-28  4:57 UTC (permalink / raw)
  To: netdev


Instead of using a fixed value of "-1" or "-EMSGSIZE", propagate what
the nla_*() interfaces actually return.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/xfrm.h   |   10 +-
 net/xfrm/xfrm_user.c |  394 ++++++++++++++++++++++++++------------------------
 2 files changed, 208 insertions(+), 196 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index e0a55df..17acbc9 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -1682,13 +1682,11 @@ static inline int xfrm_mark_get(struct nlattr **attrs, struct xfrm_mark *m)
 
 static inline int xfrm_mark_put(struct sk_buff *skb, const struct xfrm_mark *m)
 {
-	if ((m->m | m->v) &&
-	    nla_put(skb, XFRMA_MARK, sizeof(struct xfrm_mark), m))
-		goto nla_put_failure;
-	return 0;
+	int ret = 0;
 
-nla_put_failure:
-	return -1;
+	if (m->m | m->v)
+		ret = nla_put(skb, XFRMA_MARK, sizeof(struct xfrm_mark), m);
+	return ret;
 }
 
 #endif	/* _NET_XFRM_H */
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 44293b3..5407627 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -754,58 +754,67 @@ static int copy_to_user_state_extra(struct xfrm_state *x,
 				    struct xfrm_usersa_info *p,
 				    struct sk_buff *skb)
 {
-	copy_to_user_state(x, p);
-
-	if (x->coaddr &&
-	    nla_put(skb, XFRMA_COADDR, sizeof(*x->coaddr), x->coaddr))
-		goto nla_put_failure;
-
-	if (x->lastused &&
-	    nla_put_u64(skb, XFRMA_LASTUSED, x->lastused))
-		goto nla_put_failure;
-
-	if (x->aead &&
-	    nla_put(skb, XFRMA_ALG_AEAD, aead_len(x->aead), x->aead))
-		goto nla_put_failure;
-
-	if (x->aalg &&
-	    (copy_to_user_auth(x->aalg, skb) ||
-	     nla_put(skb, XFRMA_ALG_AUTH_TRUNC,
-		     xfrm_alg_auth_len(x->aalg), x->aalg)))
-		goto nla_put_failure;
-
-	if (x->ealg &&
-	    nla_put(skb, XFRMA_ALG_CRYPT, xfrm_alg_len(x->ealg), x->ealg))
-		goto nla_put_failure;
-
-	if (x->calg &&
-	    nla_put(skb, XFRMA_ALG_COMP, sizeof(*(x->calg)), x->calg))
-		goto nla_put_failure;
-
-	if (x->encap &&
-	    nla_put(skb, XFRMA_ENCAP, sizeof(*x->encap), x->encap))
-		goto nla_put_failure;
+	int ret = 0;
 
-	if (x->tfcpad &&
-	    nla_put_u32(skb, XFRMA_TFCPAD, x->tfcpad))
-		goto nla_put_failure;
-
-	if (xfrm_mark_put(skb, &x->mark))
-		goto nla_put_failure;
-
-	if (x->replay_esn &&
-	    nla_put(skb, XFRMA_REPLAY_ESN_VAL,
-		    xfrm_replay_state_esn_len(x->replay_esn),
-		    x->replay_esn))
-		goto nla_put_failure;
-
-	if (x->security && copy_sec_ctx(x->security, skb))
-		goto nla_put_failure;
-
-	return 0;
+	copy_to_user_state(x, p);
 
-nla_put_failure:
-	return -EMSGSIZE;
+	if (x->coaddr) {
+		ret = nla_put(skb, XFRMA_COADDR, sizeof(*x->coaddr), x->coaddr);
+		if (ret)
+			goto out;
+	}
+	if (x->lastused) {
+		ret = nla_put_u64(skb, XFRMA_LASTUSED, x->lastused);
+		if (ret)
+			goto out;
+	}
+	if (x->aead) {
+		ret = nla_put(skb, XFRMA_ALG_AEAD, aead_len(x->aead), x->aead);
+		if (ret)
+			goto out;
+	}
+	if (x->aalg) {
+		ret = copy_to_user_auth(x->aalg, skb);
+		if (!ret)
+			ret = nla_put(skb, XFRMA_ALG_AUTH_TRUNC,
+				      xfrm_alg_auth_len(x->aalg), x->aalg);
+		if (ret)
+			goto out;
+	}
+	if (x->ealg) {
+		ret = nla_put(skb, XFRMA_ALG_CRYPT, xfrm_alg_len(x->ealg), x->ealg);
+		if (ret)
+			goto out;
+	}
+	if (x->calg) {
+		ret = nla_put(skb, XFRMA_ALG_COMP, sizeof(*(x->calg)), x->calg);
+		if (ret)
+			goto out;
+	}
+	if (x->encap) {
+		ret = nla_put(skb, XFRMA_ENCAP, sizeof(*x->encap), x->encap);
+		if (ret)
+			goto out;
+	}
+	if (x->tfcpad) {
+		ret = nla_put_u32(skb, XFRMA_TFCPAD, x->tfcpad);
+		if (ret)
+			goto out;
+	}
+	ret = xfrm_mark_put(skb, &x->mark);
+	if (ret)
+		goto out;
+	if (x->replay_esn) {
+		ret = nla_put(skb, XFRMA_REPLAY_ESN_VAL,
+			      xfrm_replay_state_esn_len(x->replay_esn),
+			      x->replay_esn);
+		if (ret)
+			goto out;
+	}
+	if (x->security)
+		ret = copy_sec_ctx(x->security, skb);
+out:
+	return ret;
 }
 
 static int dump_one_state(struct xfrm_state *x, int count, void *ptr)
@@ -825,15 +834,12 @@ static int dump_one_state(struct xfrm_state *x, int count, void *ptr)
 	p = nlmsg_data(nlh);
 
 	err = copy_to_user_state_extra(x, p, skb);
-	if (err)
-		goto nla_put_failure;
-
+	if (err) {
+		nlmsg_cancel(skb, nlh);
+		return err;
+	}
 	nlmsg_end(skb, nlh);
 	return 0;
-
-nla_put_failure:
-	nlmsg_cancel(skb, nlh);
-	return err;
 }
 
 static int xfrm_dump_sa_done(struct netlink_callback *cb)
@@ -904,6 +910,7 @@ static int build_spdinfo(struct sk_buff *skb, struct net *net,
 	struct xfrmu_spdinfo spc;
 	struct xfrmu_spdhinfo sph;
 	struct nlmsghdr *nlh;
+	int err;
 	u32 *f;
 
 	nlh = nlmsg_put(skb, pid, seq, XFRM_MSG_NEWSPDINFO, sizeof(u32), 0);
@@ -922,15 +929,15 @@ static int build_spdinfo(struct sk_buff *skb, struct net *net,
 	sph.spdhcnt = si.spdhcnt;
 	sph.spdhmcnt = si.spdhmcnt;
 
-	if (nla_put(skb, XFRMA_SPD_INFO, sizeof(spc), &spc) ||
-	    nla_put(skb, XFRMA_SPD_HINFO, sizeof(sph), &sph))
-		goto nla_put_failure;
+	err = nla_put(skb, XFRMA_SPD_INFO, sizeof(spc), &spc);
+	if (!err)
+		err = nla_put(skb, XFRMA_SPD_HINFO, sizeof(sph), &sph);
+	if (err) {
+		nlmsg_cancel(skb, nlh);
+		return err;
+	}
 
 	return nlmsg_end(skb, nlh);
-
-nla_put_failure:
-	nlmsg_cancel(skb, nlh);
-	return -EMSGSIZE;
 }
 
 static int xfrm_get_spdinfo(struct sk_buff *skb, struct nlmsghdr *nlh,
@@ -965,6 +972,7 @@ static int build_sadinfo(struct sk_buff *skb, struct net *net,
 	struct xfrmk_sadinfo si;
 	struct xfrmu_sadhinfo sh;
 	struct nlmsghdr *nlh;
+	int err;
 	u32 *f;
 
 	nlh = nlmsg_put(skb, pid, seq, XFRM_MSG_NEWSADINFO, sizeof(u32), 0);
@@ -978,15 +986,15 @@ static int build_sadinfo(struct sk_buff *skb, struct net *net,
 	sh.sadhmcnt = si.sadhmcnt;
 	sh.sadhcnt = si.sadhcnt;
 
-	if (nla_put_u32(skb, XFRMA_SAD_CNT, si.sadcnt) ||
-	    nla_put(skb, XFRMA_SAD_HINFO, sizeof(sh), &sh))
-		goto nla_put_failure;
+	err = nla_put_u32(skb, XFRMA_SAD_CNT, si.sadcnt);
+	if (!err)
+		err = nla_put(skb, XFRMA_SAD_HINFO, sizeof(sh), &sh);
+	if (err) {
+		nlmsg_cancel(skb, nlh);
+		return err;
+	}
 
 	return nlmsg_end(skb, nlh);
-
-nla_put_failure:
-	nlmsg_cancel(skb, nlh);
-	return -EMSGSIZE;
 }
 
 static int xfrm_get_sadinfo(struct sk_buff *skb, struct nlmsghdr *nlh,
@@ -1439,9 +1447,8 @@ static inline int copy_to_user_state_sec_ctx(struct xfrm_state *x, struct sk_buf
 
 static inline int copy_to_user_sec_ctx(struct xfrm_policy *xp, struct sk_buff *skb)
 {
-	if (xp->security) {
+	if (xp->security)
 		return copy_sec_ctx(xp->security, skb);
-	}
 	return 0;
 }
 static inline size_t userpolicy_type_attrsize(void)
@@ -1477,6 +1484,7 @@ static int dump_one_policy(struct xfrm_policy *xp, int dir, int count, void *ptr
 	struct sk_buff *in_skb = sp->in_skb;
 	struct sk_buff *skb = sp->out_skb;
 	struct nlmsghdr *nlh;
+	int err;
 
 	nlh = nlmsg_put(skb, NETLINK_CB(in_skb).pid, sp->nlmsg_seq,
 			XFRM_MSG_NEWPOLICY, sizeof(*p), sp->nlmsg_flags);
@@ -1485,22 +1493,19 @@ static int dump_one_policy(struct xfrm_policy *xp, int dir, int count, void *ptr
 
 	p = nlmsg_data(nlh);
 	copy_to_user_policy(xp, p, dir);
-	if (copy_to_user_tmpl(xp, skb) < 0)
-		goto nlmsg_failure;
-	if (copy_to_user_sec_ctx(xp, skb))
-		goto nlmsg_failure;
-	if (copy_to_user_policy_type(xp->type, skb) < 0)
-		goto nlmsg_failure;
-	if (xfrm_mark_put(skb, &xp->mark))
-		goto nla_put_failure;
-
+	err = copy_to_user_tmpl(xp, skb);
+	if (!err)
+		err = copy_to_user_sec_ctx(xp, skb);
+	if (!err)
+		err = copy_to_user_policy_type(xp->type, skb);
+	if (!err)
+		err = xfrm_mark_put(skb, &xp->mark);
+	if (err) {
+		nlmsg_cancel(skb, nlh);
+		return err;
+	}
 	nlmsg_end(skb, nlh);
 	return 0;
-
-nla_put_failure:
-nlmsg_failure:
-	nlmsg_cancel(skb, nlh);
-	return -EMSGSIZE;
 }
 
 static int xfrm_dump_policy_done(struct netlink_callback *cb)
@@ -1688,6 +1693,7 @@ static int build_aevent(struct sk_buff *skb, struct xfrm_state *x, const struct
 {
 	struct xfrm_aevent_id *id;
 	struct nlmsghdr *nlh;
+	int err;
 
 	nlh = nlmsg_put(skb, c->pid, c->seq, XFRM_MSG_NEWAE, sizeof(*id), 0);
 	if (nlh == NULL)
@@ -1703,35 +1709,39 @@ static int build_aevent(struct sk_buff *skb, struct xfrm_state *x, const struct
 	id->flags = c->data.aevent;
 
 	if (x->replay_esn) {
-		if (nla_put(skb, XFRMA_REPLAY_ESN_VAL,
-			    xfrm_replay_state_esn_len(x->replay_esn),
-			    x->replay_esn))
-			goto nla_put_failure;
+		err = nla_put(skb, XFRMA_REPLAY_ESN_VAL,
+			      xfrm_replay_state_esn_len(x->replay_esn),
+			      x->replay_esn);
 	} else {
-		if (nla_put(skb, XFRMA_REPLAY_VAL, sizeof(x->replay),
-			    &x->replay))
-			goto nla_put_failure;
+		err = nla_put(skb, XFRMA_REPLAY_VAL, sizeof(x->replay),
+			      &x->replay);
 	}
-	if (nla_put(skb, XFRMA_LTIME_VAL, sizeof(x->curlft), &x->curlft))
-		goto nla_put_failure;
-
-	if ((id->flags & XFRM_AE_RTHR) &&
-	    nla_put_u32(skb, XFRMA_REPLAY_THRESH, x->replay_maxdiff))
-		goto nla_put_failure;
-
-	if ((id->flags & XFRM_AE_ETHR) &&
-	    nla_put_u32(skb, XFRMA_ETIMER_THRESH,
-			x->replay_maxage * 10 / HZ))
-		goto nla_put_failure;
+	if (err)
+		goto out_cancel;
+	err = nla_put(skb, XFRMA_LTIME_VAL, sizeof(x->curlft), &x->curlft);
+	if (err)
+		goto out_cancel;
 
-	if (xfrm_mark_put(skb, &x->mark))
-		goto nla_put_failure;
+	if (id->flags & XFRM_AE_RTHR) {
+		err = nla_put_u32(skb, XFRMA_REPLAY_THRESH, x->replay_maxdiff);
+		if (err)
+			goto out_cancel;
+	}
+	if (id->flags & XFRM_AE_ETHR) {
+		err = nla_put_u32(skb, XFRMA_ETIMER_THRESH,
+				  x->replay_maxage * 10 / HZ);
+		if (err)
+			goto out_cancel;
+	}
+	err = xfrm_mark_put(skb, &x->mark);
+	if (err)
+		goto out_cancel;
 
 	return nlmsg_end(skb, nlh);
 
-nla_put_failure:
+out_cancel:
 	nlmsg_cancel(skb, nlh);
-	return -EMSGSIZE;
+	return err;
 }
 
 static int xfrm_get_ae(struct sk_buff *skb, struct nlmsghdr *nlh,
@@ -2155,7 +2165,7 @@ static int build_migrate(struct sk_buff *skb, const struct xfrm_migrate *m,
 	const struct xfrm_migrate *mp;
 	struct xfrm_userpolicy_id *pol_id;
 	struct nlmsghdr *nlh;
-	int i;
+	int i, err;
 
 	nlh = nlmsg_put(skb, 0, 0, XFRM_MSG_MIGRATE, sizeof(*pol_id), 0);
 	if (nlh == NULL)
@@ -2167,21 +2177,25 @@ static int build_migrate(struct sk_buff *skb, const struct xfrm_migrate *m,
 	memcpy(&pol_id->sel, sel, sizeof(pol_id->sel));
 	pol_id->dir = dir;
 
-	if (k != NULL && (copy_to_user_kmaddress(k, skb) < 0))
-			goto nlmsg_failure;
-
-	if (copy_to_user_policy_type(type, skb) < 0)
-		goto nlmsg_failure;
-
+	if (k != NULL) {
+		err = copy_to_user_kmaddress(k, skb);
+		if (err)
+			goto out_cancel;
+	}
+	err = copy_to_user_policy_type(type, skb);
+	if (err)
+		goto out_cancel;
 	for (i = 0, mp = m ; i < num_migrate; i++, mp++) {
-		if (copy_to_user_migrate(mp, skb) < 0)
-			goto nlmsg_failure;
+		err = copy_to_user_migrate(mp, skb);
+		if (err)
+			goto out_cancel;
 	}
 
 	return nlmsg_end(skb, nlh);
-nlmsg_failure:
+
+out_cancel:
 	nlmsg_cancel(skb, nlh);
-	return -EMSGSIZE;
+	return err;
 }
 
 static int xfrm_send_migrate(const struct xfrm_selector *sel, u8 dir, u8 type,
@@ -2354,6 +2368,7 @@ static int build_expire(struct sk_buff *skb, struct xfrm_state *x, const struct
 {
 	struct xfrm_user_expire *ue;
 	struct nlmsghdr *nlh;
+	int err;
 
 	nlh = nlmsg_put(skb, c->pid, 0, XFRM_MSG_EXPIRE, sizeof(*ue), 0);
 	if (nlh == NULL)
@@ -2363,13 +2378,11 @@ static int build_expire(struct sk_buff *skb, struct xfrm_state *x, const struct
 	copy_to_user_state(x, &ue->state);
 	ue->hard = (c->data.hard != 0) ? 1 : 0;
 
-	if (xfrm_mark_put(skb, &x->mark))
-		goto nla_put_failure;
+	err = xfrm_mark_put(skb, &x->mark);
+	if (err)
+		return err;
 
 	return nlmsg_end(skb, nlh);
-
-nla_put_failure:
-	return -EMSGSIZE;
 }
 
 static int xfrm_exp_state_notify(struct xfrm_state *x, const struct km_event *c)
@@ -2470,7 +2483,7 @@ static int xfrm_notify_sa(struct xfrm_state *x, const struct km_event *c)
 	struct nlmsghdr *nlh;
 	struct sk_buff *skb;
 	int len = xfrm_sa_len(x);
-	int headlen;
+	int headlen, err;
 
 	headlen = sizeof(*p);
 	if (c->event == XFRM_MSG_DELSA) {
@@ -2485,8 +2498,9 @@ static int xfrm_notify_sa(struct xfrm_state *x, const struct km_event *c)
 		return -ENOMEM;
 
 	nlh = nlmsg_put(skb, c->pid, c->seq, c->event, headlen, 0);
+	err = -EMSGSIZE;
 	if (nlh == NULL)
-		goto nla_put_failure;
+		goto out_free_skb;
 
 	p = nlmsg_data(nlh);
 	if (c->event == XFRM_MSG_DELSA) {
@@ -2499,24 +2513,23 @@ static int xfrm_notify_sa(struct xfrm_state *x, const struct km_event *c)
 		id->proto = x->id.proto;
 
 		attr = nla_reserve(skb, XFRMA_SA, sizeof(*p));
+		err = -EMSGSIZE;
 		if (attr == NULL)
-			goto nla_put_failure;
+			goto out_free_skb;
 
 		p = nla_data(attr);
 	}
-
-	if (copy_to_user_state_extra(x, p, skb))
-		goto nla_put_failure;
+	err = copy_to_user_state_extra(x, p, skb);
+	if (err)
+		goto out_free_skb;
 
 	nlmsg_end(skb, nlh);
 
 	return nlmsg_multicast(net->xfrm.nlsk, skb, 0, XFRMNLGRP_SA, GFP_ATOMIC);
 
-nla_put_failure:
-	/* Somebody screwed up with xfrm_sa_len! */
-	WARN_ON(1);
+out_free_skb:
 	kfree_skb(skb);
-	return -1;
+	return err;
 }
 
 static int xfrm_send_state_notify(struct xfrm_state *x, const struct km_event *c)
@@ -2557,9 +2570,10 @@ static int build_acquire(struct sk_buff *skb, struct xfrm_state *x,
 			 struct xfrm_tmpl *xt, struct xfrm_policy *xp,
 			 int dir)
 {
+	__u32 seq = xfrm_get_acqseq();
 	struct xfrm_user_acquire *ua;
 	struct nlmsghdr *nlh;
-	__u32 seq = xfrm_get_acqseq();
+	int err;
 
 	nlh = nlmsg_put(skb, 0, 0, XFRM_MSG_ACQUIRE, sizeof(*ua), 0);
 	if (nlh == NULL)
@@ -2575,21 +2589,19 @@ static int build_acquire(struct sk_buff *skb, struct xfrm_state *x,
 	ua->calgos = xt->calgos;
 	ua->seq = x->km.seq = seq;
 
-	if (copy_to_user_tmpl(xp, skb) < 0)
-		goto nlmsg_failure;
-	if (copy_to_user_state_sec_ctx(x, skb))
-		goto nlmsg_failure;
-	if (copy_to_user_policy_type(xp->type, skb) < 0)
-		goto nlmsg_failure;
-	if (xfrm_mark_put(skb, &xp->mark))
-		goto nla_put_failure;
+	err = copy_to_user_tmpl(xp, skb);
+	if (!err)
+		err = copy_to_user_state_sec_ctx(x, skb);
+	if (!err)
+		err = copy_to_user_policy_type(xp->type, skb);
+	if (!err)
+		err = xfrm_mark_put(skb, &xp->mark);
+	if (err) {
+		nlmsg_cancel(skb, nlh);
+		return err;
+	}
 
 	return nlmsg_end(skb, nlh);
-
-nla_put_failure:
-nlmsg_failure:
-	nlmsg_cancel(skb, nlh);
-	return -EMSGSIZE;
 }
 
 static int xfrm_send_acquire(struct xfrm_state *x, struct xfrm_tmpl *xt,
@@ -2681,8 +2693,9 @@ static int build_polexpire(struct sk_buff *skb, struct xfrm_policy *xp,
 			   int dir, const struct km_event *c)
 {
 	struct xfrm_user_polexpire *upe;
-	struct nlmsghdr *nlh;
 	int hard = c->data.hard;
+	struct nlmsghdr *nlh;
+	int err;
 
 	nlh = nlmsg_put(skb, c->pid, 0, XFRM_MSG_POLEXPIRE, sizeof(*upe), 0);
 	if (nlh == NULL)
@@ -2690,22 +2703,20 @@ static int build_polexpire(struct sk_buff *skb, struct xfrm_policy *xp,
 
 	upe = nlmsg_data(nlh);
 	copy_to_user_policy(xp, &upe->pol, dir);
-	if (copy_to_user_tmpl(xp, skb) < 0)
-		goto nlmsg_failure;
-	if (copy_to_user_sec_ctx(xp, skb))
-		goto nlmsg_failure;
-	if (copy_to_user_policy_type(xp->type, skb) < 0)
-		goto nlmsg_failure;
-	if (xfrm_mark_put(skb, &xp->mark))
-		goto nla_put_failure;
+	err = copy_to_user_tmpl(xp, skb);
+	if (!err)
+		err = copy_to_user_sec_ctx(xp, skb);
+	if (!err)
+		err = copy_to_user_policy_type(xp->type, skb);
+	if (!err)
+		err = xfrm_mark_put(skb, &xp->mark);
+	if (err) {
+		nlmsg_cancel(skb, nlh);
+		return err;
+	}
 	upe->hard = !!hard;
 
 	return nlmsg_end(skb, nlh);
-
-nla_put_failure:
-nlmsg_failure:
-	nlmsg_cancel(skb, nlh);
-	return -EMSGSIZE;
 }
 
 static int xfrm_exp_policy_notify(struct xfrm_policy *xp, int dir, const struct km_event *c)
@@ -2725,13 +2736,13 @@ static int xfrm_exp_policy_notify(struct xfrm_policy *xp, int dir, const struct
 
 static int xfrm_notify_policy(struct xfrm_policy *xp, int dir, const struct km_event *c)
 {
+	int len = nla_total_size(sizeof(struct xfrm_user_tmpl) * xp->xfrm_nr);
 	struct net *net = xp_net(xp);
 	struct xfrm_userpolicy_info *p;
 	struct xfrm_userpolicy_id *id;
 	struct nlmsghdr *nlh;
 	struct sk_buff *skb;
-	int len = nla_total_size(sizeof(struct xfrm_user_tmpl) * xp->xfrm_nr);
-	int headlen;
+	int headlen, err;
 
 	headlen = sizeof(*p);
 	if (c->event == XFRM_MSG_DELPOLICY) {
@@ -2747,8 +2758,9 @@ static int xfrm_notify_policy(struct xfrm_policy *xp, int dir, const struct km_e
 		return -ENOMEM;
 
 	nlh = nlmsg_put(skb, c->pid, c->seq, c->event, headlen, 0);
+	err = -EMSGSIZE;
 	if (nlh == NULL)
-		goto nlmsg_failure;
+		goto out_free_skb;
 
 	p = nlmsg_data(nlh);
 	if (c->event == XFRM_MSG_DELPOLICY) {
@@ -2763,29 +2775,29 @@ static int xfrm_notify_policy(struct xfrm_policy *xp, int dir, const struct km_e
 			memcpy(&id->sel, &xp->selector, sizeof(id->sel));
 
 		attr = nla_reserve(skb, XFRMA_POLICY, sizeof(*p));
+		err = -EMSGSIZE;
 		if (attr == NULL)
-			goto nlmsg_failure;
+			goto out_free_skb;
 
 		p = nla_data(attr);
 	}
 
 	copy_to_user_policy(xp, p, dir);
-	if (copy_to_user_tmpl(xp, skb) < 0)
-		goto nlmsg_failure;
-	if (copy_to_user_policy_type(xp->type, skb) < 0)
-		goto nlmsg_failure;
-
-	if (xfrm_mark_put(skb, &xp->mark))
-		goto nla_put_failure;
+	err = copy_to_user_tmpl(xp, skb);
+	if (!err)
+		err = copy_to_user_policy_type(xp->type, skb);
+	if (!err)
+		err = xfrm_mark_put(skb, &xp->mark);
+	if (err)
+		goto out_free_skb;
 
 	nlmsg_end(skb, nlh);
 
 	return nlmsg_multicast(net->xfrm.nlsk, skb, 0, XFRMNLGRP_POLICY, GFP_ATOMIC);
 
-nla_put_failure:
-nlmsg_failure:
+out_free_skb:
 	kfree_skb(skb);
-	return -1;
+	return err;
 }
 
 static int xfrm_notify_policy_flush(const struct km_event *c)
@@ -2793,24 +2805,27 @@ static int xfrm_notify_policy_flush(const struct km_event *c)
 	struct net *net = c->net;
 	struct nlmsghdr *nlh;
 	struct sk_buff *skb;
+	int err;
 
 	skb = nlmsg_new(userpolicy_type_attrsize(), GFP_ATOMIC);
 	if (skb == NULL)
 		return -ENOMEM;
 
 	nlh = nlmsg_put(skb, c->pid, c->seq, XFRM_MSG_FLUSHPOLICY, 0, 0);
+	err = -EMSGSIZE;
 	if (nlh == NULL)
-		goto nlmsg_failure;
-	if (copy_to_user_policy_type(c->data.type, skb) < 0)
-		goto nlmsg_failure;
+		goto out_free_skb;
+	err = copy_to_user_policy_type(c->data.type, skb);
+	if (err)
+		goto out_free_skb;
 
 	nlmsg_end(skb, nlh);
 
 	return nlmsg_multicast(net->xfrm.nlsk, skb, 0, XFRMNLGRP_POLICY, GFP_ATOMIC);
 
-nlmsg_failure:
+out_free_skb:
 	kfree_skb(skb);
-	return -1;
+	return err;
 }
 
 static int xfrm_send_policy_notify(struct xfrm_policy *xp, int dir, const struct km_event *c)
@@ -2853,15 +2868,14 @@ static int build_report(struct sk_buff *skb, u8 proto,
 	ur->proto = proto;
 	memcpy(&ur->sel, sel, sizeof(ur->sel));
 
-	if (addr &&
-	    nla_put(skb, XFRMA_COADDR, sizeof(*addr), addr))
-		goto nla_put_failure;
-
+	if (addr) {
+		int err = nla_put(skb, XFRMA_COADDR, sizeof(*addr), addr);
+		if (err) {
+			nlmsg_cancel(skb, nlh);
+			return err;
+		}
+	}
 	return nlmsg_end(skb, nlh);
-
-nla_put_failure:
-	nlmsg_cancel(skb, nlh);
-	return -EMSGSIZE;
 }
 
 static int xfrm_send_report(struct net *net, u8 proto,
-- 
1.7.10.2

^ permalink raw reply related

* Re: [net-next RFC V3 PATCH 4/6] tuntap: multiqueue support
From: Sridhar Samudrala @ 2012-06-28  4:52 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, habanero, netdev, linux-kernel, krkumar2,
	tahm, akong, davem, shemminger, mashirle
In-Reply-To: <4FEBC936.3080001@redhat.com>

On 6/27/2012 8:02 PM, Jason Wang wrote:
> On 06/27/2012 04:44 PM, Michael S. Tsirkin wrote:
>> On Wed, Jun 27, 2012 at 01:16:30PM +0800, Jason Wang wrote:
>>> On 06/26/2012 06:42 PM, Michael S. Tsirkin wrote:
>>>> On Tue, Jun 26, 2012 at 11:42:17AM +0800, Jason Wang wrote:
>>>>> On 06/25/2012 04:25 PM, Michael S. Tsirkin wrote:
>>>>>> On Mon, Jun 25, 2012 at 02:10:18PM +0800, Jason Wang wrote:
>>>>>>> This patch adds multiqueue support for tap device. This is done 
>>>>>>> by abstracting
>>>>>>> each queue as a file/socket and allowing multiple sockets to be 
>>>>>>> attached to the
>>>>>>> tuntap device (an array of tun_file were stored in the 
>>>>>>> tun_struct). Userspace
>>>>>>> could write and read from those files to do the parallel packet
>>>>>>> sending/receiving.
>>>>>>>
>>>>>>> Unlike the previous single queue implementation, the socket and 
>>>>>>> device were
>>>>>>> loosely coupled, each of them were allowed to go away first. In 
>>>>>>> order to let the
>>>>>>> tx path lockless, netif_tx_loch_bh() is replaced by 
>>>>>>> RCU/NETIF_F_LLTX to
>>>>>>> synchronize between data path and system call.
>>>>>> Don't use LLTX/RCU. It's not worth it.
>>>>>> Use something like netif_set_real_num_tx_queues.
>>>>>>
>>>>>>> The tx queue selecting is first based on the recorded rxq index 
>>>>>>> of an skb, it
>>>>>>> there's no such one, then choosing based on rx hashing 
>>>>>>> (skb_get_rxhash()).
>>>>>>>
>>>>>>> Signed-off-by: Jason Wang<jasowang@redhat.com>
>>>>>> Interestingly macvtap switched to hashing first:
>>>>>> ef0002b577b52941fb147128f30bd1ecfdd3ff6d
>>>>>> (the commit log is corrupted but see what it
>>>>>> does in the patch).
>>>>>> Any idea why?
>>>>> Yes, so tap should be changed to behave same as macvtap. I remember
>>>>> the reason we do that is to make sure the packet of a single flow to
>>>>> be queued to a fixed socket/virtqueues. As 10g cards like ixgbe
>>>>> choose the rx queue for a flow based on the last tx queue where the
>>>>> packets of that flow comes. So if we are using recored rx queue in
>>>>> macvtap, the queue index of a flow would change as vhost thread
>>>>> moves amongs processors.
>>>> Hmm. OTOH if you override this, if TX is sent from VCPU0, RX might 
>>>> land
>>>> on VCPU1 in the guest, which is not good, right?
>>> Yes, but better than making the rx moves between vcpus when we use
>>> recorded rx queue.
>> Why isn't this a problem with native TCP?
>> I think what happens is one of the following:
>> - moving between CPUs is more expensive with tun
>>    because it can queue so much data on xmit
>> - scheduler makes very bad decisions about VCPUs
>>    bouncing them around all the time
>
> For usual native TCP/host process, as it reads and writes tcp sockets, 
> so it make make sense to move rx to the porcessor where the process 
> moves. But vhost does not do tcp stuffs and ixgbe would still move rx 
> when vhost process moves, and we can't even make sure the vhost 
> process that handling rx is running on processor that handle rx 
> interrupt.

We also saw this behavior with the default ixgbe configuration. If vhost 
is pinned to a CPU all
packets for that VM are received on a single RX queue.
So even if the VM is doing multiple TCP_RR sessions, packets for all the 
flows are received
on a single RX queue. Without pinning, vhost moves around and so does 
the packets across
the RX queues.

I think
         ethtool -K ethX ntuple on
will disable this behavior and it should be possible to program the flow 
director using ethtool -U.
This way we can split the packets across the host NIC RX queues based on 
the flows, but it is not
clear if this would help with the current model of single vhost per device.
With per-cpu vhost,  each RX queue can be handled by the matching vhost, 
but if we have only
1 queue in the VMs virtio-net device, that could become the bottleneck.
Multi-queue virtio-net should help here, but we need the same number of 
queues in VM's virtio-net
device as the host's NIC so that each vhost can handle the corresponding 
virtio queue.
But if the VM has only 2 vcpus, i think it is not efficient to have 8 
virtio-net queues.(to match a host
with 8 physical cpus and 8 RX queues in the NIC).

Thanks
Sridhar

>
>> Could we isolate which it is? Does the problem
>> still happen if you pin VCPUs to host cpus?
>> If not it's the queue depth.
>
> It may not help as tun does not record the vcpu/queue that send the 
> stream, so it can't transmit the packets back the same vcpu/queue.
>>> Flow steering is needed to make sure the tx and
>>> rx on the same vcpu.
>> That involves IPI between processes, so it might be
>> very expensive for kvm.
>>
>>>>> But during test tun/tap, one interesting thing I find is that even
>>>>> ixgbe has recorded the queue index during rx, it seems be lost when
>>>>> tap tries to transmit skbs to userspace.
>>>> dev_pick_tx does this I think but ndo_select_queue
>>>> should be able to get it without trouble.
>>>>
>>>>

^ permalink raw reply

* Re: [PATCH v2 0/5] fec driver updates
From: David Miller @ 2012-06-28  4:30 UTC (permalink / raw)
  To: shawn.guo; +Cc: LW, florian, netdev, linux-arm-kernel, devicetree-discuss
In-Reply-To: <1340804724-29410-1-git-send-email-shawn.guo@linaro.org>

From: Shawn Guo <shawn.guo@linaro.org>
Date: Wed, 27 Jun 2012 21:45:19 +0800

> Changes since v1:
> * Add one patch to use devm_gpio_request_one
> * Have a separate patch to fix phy-reset-gpios property in binding
>   document
> * Change phy-reset-interval to phy-reset-duration
> * Add a sanity check on phy-reset-duration value

All applied.

^ permalink raw reply

* Re: [patch net-next] virtio_net: allow to change mac when iface is running
From: David Miller @ 2012-06-28  4:30 UTC (permalink / raw)
  To: jpirko; +Cc: netdev, virtualization, brouer, mst
In-Reply-To: <1340810866-1017-1-git-send-email-jpirko@redhat.com>

From: Jiri Pirko <jpirko@redhat.com>
Date: Wed, 27 Jun 2012 17:27:46 +0200

> Signed-off-by: Jiri Pirko <jpirko@redhat.com>

Applied, but this seriously makes eth_mac_addr() completely useless.

Technically, every eth_mac_addr() user in a software/virtual device
should behave the way virtio_net does now.

It therefore probably makes sense to add a boolean arg which when true
elides the netif_running() check then fixup and audit every caller.

^ permalink raw reply

* Re: [PATCH v2 0/4] netdev/phy: 10G PHY support.
From: David Miller @ 2012-06-28  4:29 UTC (permalink / raw)
  To: ddaney.cavm
  Cc: grant.likely, rob.herring, devicetree-discuss, netdev,
	linux-kernel, linux-mips, afleming, david.daney
In-Reply-To: <1340818418-10382-1-git-send-email-ddaney.cavm@gmail.com>

From: David Daney <ddaney.cavm@gmail.com>
Date: Wed, 27 Jun 2012 10:33:34 -0700

> From: David Daney <david.daney@cavium.com>
> 
> The only non-cosmetic change from v1 is to pass an additional argument
> to get_phy_device() that indicates that the PHY uses 802.3 clause 45
> signaling, previously I had been using a high order bit of the addr
> parameter for this.
> 
> There are also changes from v1 in the code and comment formatting.
> These should now be closer to what David Miller prefers.

Applied, but I had to add the following warning fixup:

--------------------
phy: Fix warning in get_phy_device().

drivers/net/phy/phy_device.c: In function ‘get_phy_device’:
drivers/net/phy/phy_device.c:340:14: warning: ‘phy_id’ may be used uninitialized in this function [-Wmaybe-uninitialized]

GCC can't see that when we return zero we always initialize
phy_id and that's the only path where we use it.

Initialize phy_id to zero to shut it up.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/net/phy/phy_device.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index ef4cdee..47e02e7 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -327,9 +327,9 @@ static int get_phy_id(struct mii_bus *bus, int addr, u32 *phy_id,
  */
 struct phy_device *get_phy_device(struct mii_bus *bus, int addr, bool is_c45)
 {
-	struct phy_device *dev = NULL;
-	u32 phy_id;
 	struct phy_c45_device_ids c45_ids = {0};
+	struct phy_device *dev = NULL;
+	u32 phy_id = 0;
 	int r;
 
 	r = get_phy_id(bus, addr, &phy_id, is_c45, &c45_ids);
-- 
1.7.10.2


^ permalink raw reply related

* Re: [PATCH net-next 4/4] cnic: Handle RAMROD_CMD_ID_CLOSE error.
From: David Miller @ 2012-06-28  4:29 UTC (permalink / raw)
  To: mchan; +Cc: netdev
In-Reply-To: <1340845704-12580-4-git-send-email-mchan@broadcom.com>

From: "Michael Chan" <mchan@broadcom.com>
Date: Wed, 27 Jun 2012 18:08:22 -0700

> From: Eddie Wai <eddie.wai@broadcom.com>
> 
> If firmware returns error status, proceed to close the iSCSI connection.
> Update version to 2.5.11.
> 
> Signed-off-by: Eddie Wai <eddie.wai@broadcom.com>
> Signed-off-by: Michael Chan <mchan@broadcom.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next 3/4] cnic: Remove uio mem[0].
From: David Miller @ 2012-06-28  4:29 UTC (permalink / raw)
  To: mchan; +Cc: netdev
In-Reply-To: <1340845704-12580-3-git-send-email-mchan@broadcom.com>

From: "Michael Chan" <mchan@broadcom.com>
Date: Wed, 27 Jun 2012 18:08:21 -0700

> This memory region is no longer used.  Userspace gets the BAR address
> directly from sysfs.
> 
> Signed-off-by: Michael Chan <mchan@broadcom.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next 2/4] cnic: Read bnx2x function number from internal register
From: David Miller @ 2012-06-28  4:28 UTC (permalink / raw)
  To: mchan; +Cc: netdev
In-Reply-To: <1340845704-12580-2-git-send-email-mchan@broadcom.com>

From: "Michael Chan" <mchan@broadcom.com>
Date: Wed, 27 Jun 2012 18:08:20 -0700

> From: Eddie Wai <eddie.wai@broadcom.com>
> 
> so that it will work on any hypervisor.
> 
> Signed-off-by: Eddie Wai <eddie.wai@broadcom.com>
> Signed-off-by: Michael Chan <mchan@broadcom.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next 1/4] cnic: Fix occasional NULL pointer dereference during reboot.
From: David Miller @ 2012-06-28  4:28 UTC (permalink / raw)
  To: mchan; +Cc: netdev
In-Reply-To: <1340845704-12580-1-git-send-email-mchan@broadcom.com>

From: "Michael Chan" <mchan@broadcom.com>
Date: Wed, 27 Jun 2012 18:08:19 -0700

> We register with bnx2x before we allocate ctx_tbl structure, so it is
> possible for bnx2x to call cnic_ctl before the structure is allocated.
> This can sometimes cause NULL pointer dereference of cp->ctx_tbl.  We
> fix this by adding simple checking for valid state before proceeding.
> The cnic_ctl call is RCU protected so we don't have to deal with race
> conditions.
> 
> Because of the additional checking, we need to finish the shutdown
> before clearing the CNIC_UP flag.
> 
> Signed-off-by: Michael Chan <mchan@broadcom.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next 2/2] bnx2: Add missing netif_tx_disable() in bnx2_close()
From: David Miller @ 2012-06-28  4:28 UTC (permalink / raw)
  To: mchan; +Cc: netdev
In-Reply-To: <1340845704-12580-6-git-send-email-mchan@broadcom.com>

From: "Michael Chan" <mchan@broadcom.com>
Date: Wed, 27 Jun 2012 18:08:24 -0700

> to stop all tx queues.  Update version to 2.2.3.
> 
> Signed-off-by: Michael Chan <mchan@broadcom.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next 1/2] bnx2: Add "fall through" comments
From: David Miller @ 2012-06-28  4:28 UTC (permalink / raw)
  To: mchan; +Cc: netdev
In-Reply-To: <1340845704-12580-5-git-send-email-mchan@broadcom.com>

From: "Michael Chan" <mchan@broadcom.com>
Date: Wed, 27 Jun 2012 18:08:23 -0700

> to indicate that the mising break statements are intended.
> 
> Signed-off-by: Michael Chan <mchan@broadcom.com>

Applied.

^ permalink raw reply

* Re: [net-next RFC V3 PATCH 4/6] tuntap: multiqueue support
From: Jason Wang @ 2012-06-28  3:15 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: habanero, netdev, linux-kernel, krkumar2, tahm, akong, davem,
	shemminger, mashirle, Eric Dumazet
In-Reply-To: <20120627082635.GB15406@redhat.com>

On 06/27/2012 04:26 PM, Michael S. Tsirkin wrote:
> On Wed, Jun 27, 2012 at 01:59:37PM +0800, Jason Wang wrote:
>> On 06/26/2012 07:54 PM, Michael S. Tsirkin wrote:
>>> On Tue, Jun 26, 2012 at 01:52:57PM +0800, Jason Wang wrote:
>>>> On 06/25/2012 04:25 PM, Michael S. Tsirkin wrote:
>>>>> On Mon, Jun 25, 2012 at 02:10:18PM +0800, Jason Wang wrote:
>>>>>> This patch adds multiqueue support for tap device. This is done by abstracting
>>>>>> each queue as a file/socket and allowing multiple sockets to be attached to the
>>>>>> tuntap device (an array of tun_file were stored in the tun_struct). Userspace
>>>>>> could write and read from those files to do the parallel packet
>>>>>> sending/receiving.
>>>>>>
>>>>>> Unlike the previous single queue implementation, the socket and device were
>>>>>> loosely coupled, each of them were allowed to go away first. In order to let the
>>>>>> tx path lockless, netif_tx_loch_bh() is replaced by RCU/NETIF_F_LLTX to
>>>>>> synchronize between data path and system call.
>>>>> Don't use LLTX/RCU. It's not worth it.
>>>>> Use something like netif_set_real_num_tx_queues.
>>>>>
>>>> For LLTX, maybe it's better to convert it to alloc_netdev_mq() to
>>>> let the kernel see all queues and make the queue stopping and
>>>> per-queue stats eaiser.
>>>> RCU is used to handle the attaching/detaching when tun/tap is
>>>> sending and receiving packets which looks reasonalbe for me.
>>> Yes but do we have to allow this? How about we always ask
>>> userspace to attach to all active queues?
>> Attaching/detaching is a method to active/deactive a queue, if all
>> queues were kept attached, then we need other method or flag to mark
>> the queue as activateddeactived and still need to synchronize with
>> data path.
> This is what I am trying to say: use an interface flag for
> multiqueue. When it is set activate all queues attached.
> When unset deactivate all queues except the default one.
>
>
>>>> Not
>>>> sure netif_set_real_num_tx_queues() can help in this situation.
>>> Check it out.
>>>
>>>>>> The tx queue selecting is first based on the recorded rxq index of an skb, it
>>>>>> there's no such one, then choosing based on rx hashing (skb_get_rxhash()).
>>>>>>
>>>>>> Signed-off-by: Jason Wang<jasowang@redhat.com>
>>>>> Interestingly macvtap switched to hashing first:
>>>>> ef0002b577b52941fb147128f30bd1ecfdd3ff6d
>>>>> (the commit log is corrupted but see what it
>>>>> does in the patch).
>>>>> Any idea why?
>>>>>
>>>>>> ---
>>>>>>   drivers/net/tun.c |  371 +++++++++++++++++++++++++++++++++--------------------
>>>>>>   1 files changed, 232 insertions(+), 139 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>>>>>> index 8233b0a..5c26757 100644
>>>>>> --- a/drivers/net/tun.c
>>>>>> +++ b/drivers/net/tun.c
>>>>>> @@ -107,6 +107,8 @@ struct tap_filter {
>>>>>>   	unsigned char	addr[FLT_EXACT_COUNT][ETH_ALEN];
>>>>>>   };
>>>>>>
>>>>>> +#define MAX_TAP_QUEUES (NR_CPUS<    16 ? NR_CPUS : 16)
>>>>> Why the limit? I am guessing you copied this from macvtap?
>>>>> This is problematic for a number of reasons:
>>>>> 	- will not play well with migration
>>>>> 	- will not work well for a large guest
>>>>>
>>>>> Yes, macvtap needs to be fixed too.
>>>>>
>>>>> I am guessing what it is trying to prevent is queueing
>>>>> up a huge number of packets?
>>>>> So just divide the default tx queue limit by the # of queues.
>>>> Not sure,
>>>> another reasons I can guess:
>>>> - to prevent storing a large array of pointers in tun_struct or macvlan_dev.
>>> OK so with the limit of e.g. 1024 we'd allocate at most
>>> 2 pages of memory. This doesn't look too bad. 1024 is probably a
>>> high enough limit: modern hypervisors seem to support on the order
>>> of 100-200 CPUs so this leaves us some breathing space
>>> if we want to match a queue per guest CPU.
>>> Of course we need to limit the packets per queue
>>> in such a setup more aggressively. 1000 packets * 1000 queues
>>> * 64K per packet is too much.
>>>
>>>> - it may not be suitable to allow the number of virtqueues greater
>>>> than the number of physical queues in the card
>>> Maybe for macvtap, here we have no idea which card we
>>> are working with and how many queues it has.
>>>
>>>>> And by the way, for MQ applications maybe we can finally
>>>>> ignore tx queue altogether and limit the total number
>>>>> of bytes queued?
>>>>> To avoid regressions we can make it large like 64M/# queues.
>>>>> Could be a separate patch I think, and for a single queue
>>>>> might need a compatible mode though I am not sure.
>>>> Could you explain more about this?
>>>> Did you mean to have a total
>>>> sndbuf for all sockets that attached to tun/tap?
>>> Consider that we currently limit the # of
>>> packets queued at tun for xmit to userspace.
>>> Some limit is needed but # of packets sounds
>>> very silly - limiting the total memory
>>> might be more reasonable.
>>>
>>> In case of multiqueue, we really care about
>>> total # of packets or total memory, but a simple
>>> approximation could be to divide the allocation
>>> between active queues equally.
>> A possible method is to divce the TUN_READQ_SIZE by #queues, but
>> make it at least to be equal to the vring size (256).
> I would not enforce any limit actually.
> Simply divide by # of queues, and
> fail if userspace tries to attach>  queue size packets.
>
> With 1000 queues this is 64Mbyte worst case as is.
> If someone wants to allow userspace to drink
> 256 times as much that is 16Giga byte per
> single device, let the user tweak tx queue len.
>
>
>
>>> qdisc also queues some packets, that logic is
>>> using # of packets anyway. So either make that
>>> 1000/# queues, or even set to 0 as Eric once
>>> suggested.
>>>
>>>>>> +
>>>>>>   struct tun_file {
>>>>>>   	struct sock sk;
>>>>>>   	struct socket socket;
>>>>>> @@ -114,16 +116,18 @@ struct tun_file {
>>>>>>   	int vnet_hdr_sz;
>>>>>>   	struct tap_filter txflt;
>>>>>>   	atomic_t count;
>>>>>> -	struct tun_struct *tun;
>>>>>> +	struct tun_struct __rcu *tun;
>>>>>>   	struct net *net;
>>>>>>   	struct fasync_struct *fasync;
>>>>>>   	unsigned int flags;
>>>>>> +	u16 queue_index;
>>>>>>   };
>>>>>>
>>>>>>   struct tun_sock;
>>>>>>
>>>>>>   struct tun_struct {
>>>>>> -	struct tun_file		*tfile;
>>>>>> +	struct tun_file		*tfiles[MAX_TAP_QUEUES];
>>>>>> +	unsigned int            numqueues;
>>>>>>   	unsigned int 		flags;
>>>>>>   	uid_t			owner;
>>>>>>   	gid_t			group;
>>>>>> @@ -138,80 +142,159 @@ struct tun_struct {
>>>>>>   #endif
>>>>>>   };
>>>>>>
>>>>>> -static int tun_attach(struct tun_struct *tun, struct file *file)
>>>>>> +static DEFINE_SPINLOCK(tun_lock);
>>>>>> +
>>>>>> +/*
>>>>>> + * tun_get_queue(): calculate the queue index
>>>>>> + *     - if skbs comes from mq nics, we can just borrow
>>>>>> + *     - if not, calculate from the hash
>>>>>> + */
>>>>>> +static struct tun_file *tun_get_queue(struct net_device *dev,
>>>>>> +				      struct sk_buff *skb)
>>>>>>   {
>>>>>> -	struct tun_file *tfile = file->private_data;
>>>>>> -	int err;
>>>>>> +	struct tun_struct *tun = netdev_priv(dev);
>>>>>> +	struct tun_file *tfile = NULL;
>>>>>> +	int numqueues = tun->numqueues;
>>>>>> +	__u32 rxq;
>>>>>>
>>>>>> -	ASSERT_RTNL();
>>>>>> +	BUG_ON(!rcu_read_lock_held());
>>>>>>
>>>>>> -	netif_tx_lock_bh(tun->dev);
>>>>>> +	if (!numqueues)
>>>>>> +		goto out;
>>>>>>
>>>>>> -	err = -EINVAL;
>>>>>> -	if (tfile->tun)
>>>>>> +	if (numqueues == 1) {
>>>>>> +		tfile = rcu_dereference(tun->tfiles[0]);
>>>>> Instead of hacks like this, you can ask for an MQ
>>>>> flag to be set in SETIFF. Then you won't need to
>>>>> handle attach/detach at random times.
>>>> Consier user switch between a sq guest to mq guest, qemu would
>>>> attach or detach the fd which could not be expceted in kernel.
>>> Can't userspace keep it attached always, just deactivate MQ?
>>>
>>>>> And most of the scary num_queues checks can go away.
>>>> Even we has a MQ flag, userspace could still just attach one queue
>>>> to the device.
>>> I think we allow too much flexibility if we let
>>> userspace detach a random queue.
>> The point is to let tun/tap has the same flexibility as macvtap.
>> Macvtap allows add/delete queues at any time and it's very easy to
>> add detach/attach to macvtap. So we can easily use almost the same
>> ioctls to active/deactive a queue at any time for both tap and
>> macvtap.
> Yes but userspace does not do this in practice:
> it decides how many queues and just activates them all.

The problem here I think is:

- We export files descriptors to userspace, so any of the files could  
be closed at anytime which could not be expected.
- Easy to let tap and macvtap has the same ioctls.
>
>
[...]

^ permalink raw reply

* Re: [net-next RFC V3 PATCH 4/6] tuntap: multiqueue support
From: Jason Wang @ 2012-06-28  3:02 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: habanero, netdev, linux-kernel, krkumar2, tahm, akong, davem,
	shemminger, mashirle
In-Reply-To: <20120627084431.GC15406@redhat.com>

On 06/27/2012 04:44 PM, Michael S. Tsirkin wrote:
> On Wed, Jun 27, 2012 at 01:16:30PM +0800, Jason Wang wrote:
>> On 06/26/2012 06:42 PM, Michael S. Tsirkin wrote:
>>> On Tue, Jun 26, 2012 at 11:42:17AM +0800, Jason Wang wrote:
>>>> On 06/25/2012 04:25 PM, Michael S. Tsirkin wrote:
>>>>> On Mon, Jun 25, 2012 at 02:10:18PM +0800, Jason Wang wrote:
>>>>>> This patch adds multiqueue support for tap device. This is done by abstracting
>>>>>> each queue as a file/socket and allowing multiple sockets to be attached to the
>>>>>> tuntap device (an array of tun_file were stored in the tun_struct). Userspace
>>>>>> could write and read from those files to do the parallel packet
>>>>>> sending/receiving.
>>>>>>
>>>>>> Unlike the previous single queue implementation, the socket and device were
>>>>>> loosely coupled, each of them were allowed to go away first. In order to let the
>>>>>> tx path lockless, netif_tx_loch_bh() is replaced by RCU/NETIF_F_LLTX to
>>>>>> synchronize between data path and system call.
>>>>> Don't use LLTX/RCU. It's not worth it.
>>>>> Use something like netif_set_real_num_tx_queues.
>>>>>
>>>>>> The tx queue selecting is first based on the recorded rxq index of an skb, it
>>>>>> there's no such one, then choosing based on rx hashing (skb_get_rxhash()).
>>>>>>
>>>>>> Signed-off-by: Jason Wang<jasowang@redhat.com>
>>>>> Interestingly macvtap switched to hashing first:
>>>>> ef0002b577b52941fb147128f30bd1ecfdd3ff6d
>>>>> (the commit log is corrupted but see what it
>>>>> does in the patch).
>>>>> Any idea why?
>>>> Yes, so tap should be changed to behave same as macvtap. I remember
>>>> the reason we do that is to make sure the packet of a single flow to
>>>> be queued to a fixed socket/virtqueues. As 10g cards like ixgbe
>>>> choose the rx queue for a flow based on the last tx queue where the
>>>> packets of that flow comes. So if we are using recored rx queue in
>>>> macvtap, the queue index of a flow would change as vhost thread
>>>> moves amongs processors.
>>> Hmm. OTOH if you override this, if TX is sent from VCPU0, RX might land
>>> on VCPU1 in the guest, which is not good, right?
>> Yes, but better than making the rx moves between vcpus when we use
>> recorded rx queue.
> Why isn't this a problem with native TCP?
> I think what happens is one of the following:
> - moving between CPUs is more expensive with tun
>    because it can queue so much data on xmit
> - scheduler makes very bad decisions about VCPUs
>    bouncing them around all the time

For usual native TCP/host process, as it reads and writes tcp sockets, 
so it make make sense to move rx to the porcessor where the process 
moves. But vhost does not do tcp stuffs and ixgbe would still move rx 
when vhost process moves, and we can't even make sure the vhost process 
that handling rx is running on processor that handle rx interrupt.

> Could we isolate which it is? Does the problem
> still happen if you pin VCPUs to host cpus?
> If not it's the queue depth.

It may not help as tun does not record the vcpu/queue that send the 
stream, so it can't transmit the packets back the same vcpu/queue.
>> Flow steering is needed to make sure the tx and
>> rx on the same vcpu.
> That involves IPI between processes, so it might be
> very expensive for kvm.
>
>>>> But during test tun/tap, one interesting thing I find is that even
>>>> ixgbe has recorded the queue index during rx, it seems be lost when
>>>> tap tries to transmit skbs to userspace.
>>> dev_pick_tx does this I think but ndo_select_queue
>>> should be able to get it without trouble.
>>>
>>>
>>>>>> ---
>>>>>>   drivers/net/tun.c |  371 +++++++++++++++++++++++++++++++++--------------------
>>>>>>   1 files changed, 232 insertions(+), 139 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>>>>>> index 8233b0a..5c26757 100644
>>>>>> --- a/drivers/net/tun.c
>>>>>> +++ b/drivers/net/tun.c
>>>>>> @@ -107,6 +107,8 @@ struct tap_filter {
>>>>>>   	unsigned char	addr[FLT_EXACT_COUNT][ETH_ALEN];
>>>>>>   };
>>>>>>
>>>>>> +#define MAX_TAP_QUEUES (NR_CPUS<    16 ? NR_CPUS : 16)
>>>>> Why the limit? I am guessing you copied this from macvtap?
>>>>> This is problematic for a number of reasons:
>>>>> 	- will not play well with migration
>>>>> 	- will not work well for a large guest
>>>>>
>>>>> Yes, macvtap needs to be fixed too.
>>>>>
>>>>> I am guessing what it is trying to prevent is queueing
>>>>> up a huge number of packets?
>>>>> So just divide the default tx queue limit by the # of queues.
>>>>>
>>>>> And by the way, for MQ applications maybe we can finally
>>>>> ignore tx queue altogether and limit the total number
>>>>> of bytes queued?
>>>>> To avoid regressions we can make it large like 64M/# queues.
>>>>> Could be a separate patch I think, and for a single queue
>>>>> might need a compatible mode though I am not sure.
>>>>>
>>>>>> +
>>>>>>   struct tun_file {
>>>>>>   	struct sock sk;
>>>>>>   	struct socket socket;
>>>>>> @@ -114,16 +116,18 @@ struct tun_file {
>>>>>>   	int vnet_hdr_sz;
>>>>>>   	struct tap_filter txflt;
>>>>>>   	atomic_t count;
>>>>>> -	struct tun_struct *tun;
>>>>>> +	struct tun_struct __rcu *tun;
>>>>>>   	struct net *net;
>>>>>>   	struct fasync_struct *fasync;
>>>>>>   	unsigned int flags;
>>>>>> +	u16 queue_index;
>>>>>>   };
>>>>>>
>>>>>>   struct tun_sock;
>>>>>>
>>>>>>   struct tun_struct {
>>>>>> -	struct tun_file		*tfile;
>>>>>> +	struct tun_file		*tfiles[MAX_TAP_QUEUES];
>>>>>> +	unsigned int            numqueues;
>>>>>>   	unsigned int 		flags;
>>>>>>   	uid_t			owner;
>>>>>>   	gid_t			group;
>>>>>> @@ -138,80 +142,159 @@ struct tun_struct {
>>>>>>   #endif
>>>>>>   };
>>>>>>
>>>>>> -static int tun_attach(struct tun_struct *tun, struct file *file)
>>>>>> +static DEFINE_SPINLOCK(tun_lock);
>>>>>> +
>>>>>> +/*
>>>>>> + * tun_get_queue(): calculate the queue index
>>>>>> + *     - if skbs comes from mq nics, we can just borrow
>>>>>> + *     - if not, calculate from the hash
>>>>>> + */
>>>>>> +static struct tun_file *tun_get_queue(struct net_device *dev,
>>>>>> +				      struct sk_buff *skb)
>>>>>>   {
>>>>>> -	struct tun_file *tfile = file->private_data;
>>>>>> -	int err;
>>>>>> +	struct tun_struct *tun = netdev_priv(dev);
>>>>>> +	struct tun_file *tfile = NULL;
>>>>>> +	int numqueues = tun->numqueues;
>>>>>> +	__u32 rxq;
>>>>>>
>>>>>> -	ASSERT_RTNL();
>>>>>> +	BUG_ON(!rcu_read_lock_held());
>>>>>>
>>>>>> -	netif_tx_lock_bh(tun->dev);
>>>>>> +	if (!numqueues)
>>>>>> +		goto out;
>>>>>>
>>>>>> -	err = -EINVAL;
>>>>>> -	if (tfile->tun)
>>>>>> +	if (numqueues == 1) {
>>>>>> +		tfile = rcu_dereference(tun->tfiles[0]);
>>>>> Instead of hacks like this, you can ask for an MQ
>>>>> flag to be set in SETIFF. Then you won't need to
>>>>> handle attach/detach at random times.
>>>>> And most of the scary num_queues checks can go away.
>>>>> You can then also ask userspace about the max # of queues
>>>>> to expect if you want to save some memory.
>>>>>
>>>>>
>>>>>>   		goto out;
>>>>>> +	}
>>>>>>
>>>>>> -	err = -EBUSY;
>>>>>> -	if (tun->tfile)
>>>>>> +	if (likely(skb_rx_queue_recorded(skb))) {
>>>>>> +		rxq = skb_get_rx_queue(skb);
>>>>>> +
>>>>>> +		while (unlikely(rxq>= numqueues))
>>>>>> +			rxq -= numqueues;
>>>>>> +
>>>>>> +		tfile = rcu_dereference(tun->tfiles[rxq]);
>>>>>>   		goto out;
>>>>>> +	}
>>>>>>
>>>>>> -	err = 0;
>>>>>> -	tfile->tun = tun;
>>>>>> -	tun->tfile = tfile;
>>>>>> -	netif_carrier_on(tun->dev);
>>>>>> -	dev_hold(tun->dev);
>>>>>> -	sock_hold(&tfile->sk);
>>>>>> -	atomic_inc(&tfile->count);
>>>>>> +	/* Check if we can use flow to select a queue */
>>>>>> +	rxq = skb_get_rxhash(skb);
>>>>>> +	if (rxq) {
>>>>>> +		u32 idx = ((u64)rxq * numqueues)>>    32;
>>>>> This completely confuses me. What's the logic here?
>>>>> How do we even know it's in range?
>>>>>
>>>>>> +		tfile = rcu_dereference(tun->tfiles[idx]);
>>>>>> +		goto out;
>>>>>> +	}
>>>>>>
>>>>>> +	tfile = rcu_dereference(tun->tfiles[0]);
>>>>>>   out:
>>>>>> -	netif_tx_unlock_bh(tun->dev);
>>>>>> -	return err;
>>>>>> +	return tfile;
>>>>>>   }
>>>>>>
>>>>>> -static void __tun_detach(struct tun_struct *tun)
>>>>>> +static int tun_detach(struct tun_file *tfile, bool clean)
>>>>>>   {
>>>>>> -	struct tun_file *tfile = tun->tfile;
>>>>>> -	/* Detach from net device */
>>>>>> -	netif_tx_lock_bh(tun->dev);
>>>>>> -	netif_carrier_off(tun->dev);
>>>>>> -	tun->tfile = NULL;
>>>>>> -	netif_tx_unlock_bh(tun->dev);
>>>>>> -
>>>>>> -	/* Drop read queue */
>>>>>> -	skb_queue_purge(&tfile->socket.sk->sk_receive_queue);
>>>>>> -
>>>>>> -	/* Drop the extra count on the net device */
>>>>>> -	dev_put(tun->dev);
>>>>>> -}
>>>>>> +	struct tun_struct *tun;
>>>>>> +	struct net_device *dev = NULL;
>>>>>> +	bool destroy = false;
>>>>>>
>>>>>> -static void tun_detach(struct tun_struct *tun)
>>>>>> -{
>>>>>> -	rtnl_lock();
>>>>>> -	__tun_detach(tun);
>>>>>> -	rtnl_unlock();
>>>>>> -}
>>>>>> +	spin_lock(&tun_lock);
>>>>>>
>>>>>> -static struct tun_struct *__tun_get(struct tun_file *tfile)
>>>>>> -{
>>>>>> -	struct tun_struct *tun = NULL;
>>>>>> +	tun = rcu_dereference_protected(tfile->tun,
>>>>>> +					lockdep_is_held(&tun_lock));
>>>>>> +	if (tun) {
>>>>>> +		u16 index = tfile->queue_index;
>>>>>> +		BUG_ON(index>= tun->numqueues);
>>>>>> +		dev = tun->dev;
>>>>>> +
>>>>>> +		rcu_assign_pointer(tun->tfiles[index],
>>>>>> +				   tun->tfiles[tun->numqueues - 1]);
>>>>>> +		tun->tfiles[index]->queue_index = index;
>>>>>> +		rcu_assign_pointer(tfile->tun, NULL);
>>>>>> +		--tun->numqueues;
>>>>>> +		sock_put(&tfile->sk);
>>>>>>
>>>>>> -	if (atomic_inc_not_zero(&tfile->count))
>>>>>> -		tun = tfile->tun;
>>>>>> +		if (tun->numqueues == 0&&    !(tun->flags&    TUN_PERSIST))
>>>>>> +			destroy = true;
>>>>> Please don't use flags like that. Use dedicated labels and goto there on error.
>>>>>
>>>>>
>>>>>> +	}
>>>>>>
>>>>>> -	return tun;
>>>>>> +	spin_unlock(&tun_lock);
>>>>>> +
>>>>>> +	synchronize_rcu();
>>>>>> +	if (clean)
>>>>>> +		sock_put(&tfile->sk);
>>>>>> +
>>>>>> +	if (destroy) {
>>>>>> +		rtnl_lock();
>>>>>> +		if (dev->reg_state == NETREG_REGISTERED)
>>>>>> +			unregister_netdevice(dev);
>>>>>> +		rtnl_unlock();
>>>>>> +	}
>>>>>> +
>>>>>> +	return 0;
>>>>>>   }
>>>>>>
>>>>>> -static struct tun_struct *tun_get(struct file *file)
>>>>>> +static void tun_detach_all(struct net_device *dev)
>>>>>>   {
>>>>>> -	return __tun_get(file->private_data);
>>>>>> +	struct tun_struct *tun = netdev_priv(dev);
>>>>>> +	struct tun_file *tfile, *tfile_list[MAX_TAP_QUEUES];
>>>>>> +	int i, j = 0;
>>>>>> +
>>>>>> +	spin_lock(&tun_lock);
>>>>>> +
>>>>>> +	for (i = 0; i<    MAX_TAP_QUEUES&&    tun->numqueues; i++) {
>>>>>> +		tfile = rcu_dereference_protected(tun->tfiles[i],
>>>>>> +						lockdep_is_held(&tun_lock));
>>>>>> +		BUG_ON(!tfile);
>>>>>> +		wake_up_all(&tfile->wq.wait);
>>>>>> +		tfile_list[j++] = tfile;
>>>>>> +		rcu_assign_pointer(tfile->tun, NULL);
>>>>>> +		--tun->numqueues;
>>>>>> +	}
>>>>>> +	BUG_ON(tun->numqueues != 0);
>>>>>> +	/* guarantee that any future tun_attach will fail */
>>>>>> +	tun->numqueues = MAX_TAP_QUEUES;
>>>>>> +	spin_unlock(&tun_lock);
>>>>>> +
>>>>>> +	synchronize_rcu();
>>>>>> +	for (--j; j>= 0; j--)
>>>>>> +		sock_put(&tfile_list[j]->sk);
>>>>>>   }
>>>>>>
>>>>>> -static void tun_put(struct tun_struct *tun)
>>>>>> +static int tun_attach(struct tun_struct *tun, struct file *file)
>>>>>>   {
>>>>>> -	struct tun_file *tfile = tun->tfile;
>>>>>> +	struct tun_file *tfile = file->private_data;
>>>>>> +	int err;
>>>>>> +
>>>>>> +	ASSERT_RTNL();
>>>>>> +
>>>>>> +	spin_lock(&tun_lock);
>>>>>>
>>>>>> -	if (atomic_dec_and_test(&tfile->count))
>>>>>> -		tun_detach(tfile->tun);
>>>>>> +	err = -EINVAL;
>>>>>> +	if (rcu_dereference_protected(tfile->tun, lockdep_is_held(&tun_lock)))
>>>>>> +		goto out;
>>>>>> +
>>>>>> +	err = -EBUSY;
>>>>>> +	if (!(tun->flags&    TUN_TAP_MQ)&&    tun->numqueues == 1)
>>>>>> +		goto out;
>>>>>> +
>>>>>> +	if (tun->numqueues == MAX_TAP_QUEUES)
>>>>>> +		goto out;
>>>>>> +
>>>>>> +	err = 0;
>>>>>> +	tfile->queue_index = tun->numqueues;
>>>>>> +	rcu_assign_pointer(tfile->tun, tun);
>>>>>> +	rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);
>>>>>> +	sock_hold(&tfile->sk);
>>>>>> +	tun->numqueues++;
>>>>>> +
>>>>>> +	if (tun->numqueues == 1)
>>>>>> +		netif_carrier_on(tun->dev);
>>>>>> +
>>>>>> +	/* device is allowed to go away first, so no need to hold extra
>>>>>> +	 * refcnt. */
>>>>>> +
>>>>>> +out:
>>>>>> +	spin_unlock(&tun_lock);
>>>>>> +	return err;
>>>>>>   }
>>>>>>
>>>>>>   /* TAP filtering */
>>>>>> @@ -331,16 +414,7 @@ static const struct ethtool_ops tun_ethtool_ops;
>>>>>>   /* Net device detach from fd. */
>>>>>>   static void tun_net_uninit(struct net_device *dev)
>>>>>>   {
>>>>>> -	struct tun_struct *tun = netdev_priv(dev);
>>>>>> -	struct tun_file *tfile = tun->tfile;
>>>>>> -
>>>>>> -	/* Inform the methods they need to stop using the dev.
>>>>>> -	 */
>>>>>> -	if (tfile) {
>>>>>> -		wake_up_all(&tfile->wq.wait);
>>>>>> -		if (atomic_dec_and_test(&tfile->count))
>>>>>> -			__tun_detach(tun);
>>>>>> -	}
>>>>>> +	tun_detach_all(dev);
>>>>>>   }
>>>>>>
>>>>>>   /* Net device open. */
>>>>>> @@ -360,10 +434,10 @@ static int tun_net_close(struct net_device *dev)
>>>>>>   /* Net device start xmit */
>>>>>>   static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
>>>>>>   {
>>>>>> -	struct tun_struct *tun = netdev_priv(dev);
>>>>>> -	struct tun_file *tfile = tun->tfile;
>>>>>> +	struct tun_file *tfile = NULL;
>>>>>>
>>>>>> -	tun_debug(KERN_INFO, tun, "tun_net_xmit %d\n", skb->len);
>>>>>> +	rcu_read_lock();
>>>>>> +	tfile = tun_get_queue(dev, skb);
>>>>>>
>>>>>>   	/* Drop packet if interface is not attached */
>>>>>>   	if (!tfile)
>>>>>> @@ -381,7 +455,8 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
>>>>>>
>>>>>>   	if (skb_queue_len(&tfile->socket.sk->sk_receive_queue)
>>>>>>   	>= dev->tx_queue_len) {
>>>>>> -		if (!(tun->flags&    TUN_ONE_QUEUE)) {
>>>>>> +		if (!(tfile->flags&    TUN_ONE_QUEUE)&&
>>>>> Which patch moved flags from tun to tfile?
>>>>>
>>>>>> +		    !(tfile->flags&    TUN_TAP_MQ)) {
>>>>>>   			/* Normal queueing mode. */
>>>>>>   			/* Packet scheduler handles dropping of further packets. */
>>>>>>   			netif_stop_queue(dev);
>>>>>> @@ -390,7 +465,7 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
>>>>>>   			 * error is more appropriate. */
>>>>>>   			dev->stats.tx_fifo_errors++;
>>>>>>   		} else {
>>>>>> -			/* Single queue mode.
>>>>>> +			/* Single queue mode or multi queue mode.
>>>>>>   			 * Driver handles dropping of all packets itself. */
>>>>> Please don't do this. Stop the queue on overrun as appropriate.
>>>>> ONE_QUEUE is a legacy hack.
>>>>>
>>>>> BTW we really should stop queue before we start dropping packets,
>>>>> but that can be a separate patch.
>>>>>
>>>>>>   			goto drop;
>>>>>>   		}
>>>>>> @@ -408,9 +483,11 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
>>>>>>   		kill_fasync(&tfile->fasync, SIGIO, POLL_IN);
>>>>>>   	wake_up_interruptible_poll(&tfile->wq.wait, POLLIN |
>>>>>>   				   POLLRDNORM | POLLRDBAND);
>>>>>> +	rcu_read_unlock();
>>>>>>   	return NETDEV_TX_OK;
>>>>>>
>>>>>>   drop:
>>>>>> +	rcu_read_unlock();
>>>>>>   	dev->stats.tx_dropped++;
>>>>>>   	kfree_skb(skb);
>>>>>>   	return NETDEV_TX_OK;
>>>>>> @@ -527,16 +604,22 @@ static void tun_net_init(struct net_device *dev)
>>>>>>   static unsigned int tun_chr_poll(struct file *file, poll_table * wait)
>>>>>>   {
>>>>>>   	struct tun_file *tfile = file->private_data;
>>>>>> -	struct tun_struct *tun = __tun_get(tfile);
>>>>>> +	struct tun_struct *tun = NULL;
>>>>>>   	struct sock *sk;
>>>>>>   	unsigned int mask = 0;
>>>>>>
>>>>>> -	if (!tun)
>>>>>> +	if (!tfile)
>>>>>>   		return POLLERR;
>>>>>>
>>>>>> -	sk = tfile->socket.sk;
>>>>>> +	rcu_read_lock();
>>>>>> +	tun = rcu_dereference(tfile->tun);
>>>>>> +	if (!tun) {
>>>>>> +		rcu_read_unlock();
>>>>>> +		return POLLERR;
>>>>>> +	}
>>>>>> +	rcu_read_unlock();
>>>>>>
>>>>>> -	tun_debug(KERN_INFO, tun, "tun_chr_poll\n");
>>>>>> +	sk =&tfile->sk;
>>>>>>
>>>>>>   	poll_wait(file,&tfile->wq.wait, wait);
>>>>>>
>>>>>> @@ -548,10 +631,12 @@ static unsigned int tun_chr_poll(struct file *file, poll_table * wait)
>>>>>>   	     sock_writeable(sk)))
>>>>>>   		mask |= POLLOUT | POLLWRNORM;
>>>>>>
>>>>>> -	if (tun->dev->reg_state != NETREG_REGISTERED)
>>>>>> +	rcu_read_lock();
>>>>>> +	tun = rcu_dereference(tfile->tun);
>>>>>> +	if (!tun || tun->dev->reg_state != NETREG_REGISTERED)
>>>>>>   		mask = POLLERR;
>>>>>> +	rcu_read_unlock();
>>>>>>
>>>>>> -	tun_put(tun);
>>>>>>   	return mask;
>>>>>>   }
>>>>>>
>>>>>> @@ -708,9 +793,12 @@ static ssize_t tun_get_user(struct tun_file *tfile,
>>>>>>   		skb_shinfo(skb)->gso_segs = 0;
>>>>>>   	}
>>>>>>
>>>>>> -	tun = __tun_get(tfile);
>>>>>> -	if (!tun)
>>>>>> +	rcu_read_lock();
>>>>>> +	tun = rcu_dereference(tfile->tun);
>>>>>> +	if (!tun) {
>>>>>> +		rcu_read_unlock();
>>>>>>   		return -EBADFD;
>>>>>> +	}
>>>>>>
>>>>>>   	switch (tfile->flags&    TUN_TYPE_MASK) {
>>>>>>   	case TUN_TUN_DEV:
>>>>>> @@ -720,26 +808,30 @@ static ssize_t tun_get_user(struct tun_file *tfile,
>>>>>>   		skb->protocol = eth_type_trans(skb, tun->dev);
>>>>>>   		break;
>>>>>>   	}
>>>>>> -
>>>>>> -	netif_rx_ni(skb);
>>>>>>   	tun->dev->stats.rx_packets++;
>>>>>>   	tun->dev->stats.rx_bytes += len;
>>>>>> -	tun_put(tun);
>>>>>> +	rcu_read_unlock();
>>>>>> +
>>>>>> +	netif_rx_ni(skb);
>>>>>> +
>>>>>>   	return count;
>>>>>>
>>>>>>   err_free:
>>>>>>   	count = -EINVAL;
>>>>>>   	kfree_skb(skb);
>>>>>>   err:
>>>>>> -	tun = __tun_get(tfile);
>>>>>> -	if (!tun)
>>>>>> +	rcu_read_lock();
>>>>>> +	tun = rcu_dereference(tfile->tun);
>>>>>> +	if (!tun) {
>>>>>> +		rcu_read_unlock();
>>>>>>   		return -EBADFD;
>>>>>> +	}
>>>>>>
>>>>>>   	if (drop)
>>>>>>   		tun->dev->stats.rx_dropped++;
>>>>>>   	if (error)
>>>>>>   		tun->dev->stats.rx_frame_errors++;
>>>>>> -	tun_put(tun);
>>>>>> +	rcu_read_unlock();
>>>>>>   	return count;
>>>>>>   }
>>>>>>
>>>>>> @@ -833,12 +925,13 @@ static ssize_t tun_put_user(struct tun_file *tfile,
>>>>>>   	skb_copy_datagram_const_iovec(skb, 0, iv, total, len);
>>>>>>   	total += skb->len;
>>>>>>
>>>>>> -	tun = __tun_get(tfile);
>>>>>> +	rcu_read_lock();
>>>>>> +	tun = rcu_dereference(tfile->tun);
>>>>>>   	if (tun) {
>>>>>>   		tun->dev->stats.tx_packets++;
>>>>>>   		tun->dev->stats.tx_bytes += len;
>>>>>> -		tun_put(tun);
>>>>>>   	}
>>>>>> +	rcu_read_unlock();
>>>>>>
>>>>>>   	return total;
>>>>>>   }
>>>>>> @@ -869,28 +962,31 @@ static ssize_t tun_do_read(struct tun_file *tfile,
>>>>>>   				break;
>>>>>>   			}
>>>>>>
>>>>>> -			tun = __tun_get(tfile);
>>>>>> +			rcu_read_lock();
>>>>>> +			tun = rcu_dereference(tfile->tun);
>>>>>>   			if (!tun) {
>>>>>> -				ret = -EIO;
>>>>>> +				ret = -EBADFD;
>>>>> BADFD is for when you get passed something like -1 fd.
>>>>> Here fd is OK, it's just in a bad state so you can not do IO.
>>>>>
>>>>>
>>>>>> +				rcu_read_unlock();
>>>>>>   				break;
>>>>>>   			}
>>>>>>   			if (tun->dev->reg_state != NETREG_REGISTERED) {
>>>>>>   				ret = -EIO;
>>>>>> -				tun_put(tun);
>>>>>> +				rcu_read_unlock();
>>>>>>   				break;
>>>>>>   			}
>>>>>> -			tun_put(tun);
>>>>>> +			rcu_read_unlock();
>>>>>>
>>>>>>   			/* Nothing to read, let's sleep */
>>>>>>   			schedule();
>>>>>>   			continue;
>>>>>>   		}
>>>>>>
>>>>>> -		tun = __tun_get(tfile);
>>>>>> +		rcu_read_lock();
>>>>>> +		tun = rcu_dereference(tfile->tun);
>>>>>>   		if (tun) {
>>>>>>   			netif_wake_queue(tun->dev);
>>>>>> -			tun_put(tun);
>>>>>>   		}
>>>>>> +		rcu_read_unlock();
>>>>>>
>>>>>>   		ret = tun_put_user(tfile, skb, iv, len);
>>>>>>   		kfree_skb(skb);
>>>>>> @@ -1038,6 +1134,9 @@ static int tun_flags(struct tun_struct *tun)
>>>>>>   	if (tun->flags&    TUN_VNET_HDR)
>>>>>>   		flags |= IFF_VNET_HDR;
>>>>>>
>>>>>> +	if (tun->flags&    TUN_TAP_MQ)
>>>>>> +		flags |= IFF_MULTI_QUEUE;
>>>>>> +
>>>>>>   	return flags;
>>>>>>   }
>>>>>>
>>>>>> @@ -1097,8 +1196,7 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
>>>>>>   		err = tun_attach(tun, file);
>>>>>>   		if (err<    0)
>>>>>>   			return err;
>>>>>> -	}
>>>>>> -	else {
>>>>>> +	} else {
>>>>>>   		char *name;
>>>>>>   		unsigned long flags = 0;
>>>>>>
>>>>>> @@ -1142,6 +1240,8 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
>>>>>>   		dev->hw_features = NETIF_F_SG | NETIF_F_FRAGLIST |
>>>>>>   			TUN_USER_FEATURES;
>>>>>>   		dev->features = dev->hw_features;
>>>>>> +		if (ifr->ifr_flags&    IFF_MULTI_QUEUE)
>>>>>> +			dev->features |= NETIF_F_LLTX;
>>>>>>
>>>>>>   		err = register_netdevice(tun->dev);
>>>>>>   		if (err<    0)
>>>>>> @@ -1154,7 +1254,7 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
>>>>>>
>>>>>>   		err = tun_attach(tun, file);
>>>>>>   		if (err<    0)
>>>>>> -			goto failed;
>>>>>> +			goto err_free_dev;
>>>>>>   	}
>>>>>>
>>>>>>   	tun_debug(KERN_INFO, tun, "tun_set_iff\n");
>>>>>> @@ -1174,6 +1274,11 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
>>>>>>   	else
>>>>>>   		tun->flags&= ~TUN_VNET_HDR;
>>>>>>
>>>>>> +	if (ifr->ifr_flags&    IFF_MULTI_QUEUE)
>>>>>> +		tun->flags |= TUN_TAP_MQ;
>>>>>> +	else
>>>>>> +		tun->flags&= ~TUN_TAP_MQ;
>>>>>> +
>>>>>>   	/* Cache flags from tun device */
>>>>>>   	tfile->flags = tun->flags;
>>>>>>   	/* Make sure persistent devices do not get stuck in
>>>>>> @@ -1187,7 +1292,6 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
>>>>>>
>>>>>>   err_free_dev:
>>>>>>   	free_netdev(dev);
>>>>>> -failed:
>>>>>>   	return err;
>>>>>>   }
>>>>>>
>>>>>> @@ -1264,38 +1368,40 @@ static long __tun_chr_ioctl(struct file *file, unsigned int cmd,
>>>>>>   				(unsigned int __user*)argp);
>>>>>>   	}
>>>>>>
>>>>>> -	rtnl_lock();
>>>>>> -
>>>>>> -	tun = __tun_get(tfile);
>>>>>> -	if (cmd == TUNSETIFF&&    !tun) {
>>>>>> +	ret = 0;
>>>>>> +	if (cmd == TUNSETIFF) {
>>>>>> +		rtnl_lock();
>>>>>>   		ifr.ifr_name[IFNAMSIZ-1] = '\0';
>>>>>> -
>>>>>>   		ret = tun_set_iff(tfile->net, file,&ifr);
>>>>>> -
>>>>>> +		rtnl_unlock();
>>>>>>   		if (ret)
>>>>>> -			goto unlock;
>>>>>> -
>>>>>> +			return ret;
>>>>>>   		if (copy_to_user(argp,&ifr, ifreq_len))
>>>>>> -			ret = -EFAULT;
>>>>>> -		goto unlock;
>>>>>> +			return -EFAULT;
>>>>>> +		return ret;
>>>>>>   	}
>>>>>>
>>>>>> +	rtnl_lock();
>>>>>> +
>>>>>> +	rcu_read_lock();
>>>>>> +
>>>>>>   	ret = -EBADFD;
>>>>>> +	tun = rcu_dereference(tfile->tun);
>>>>>>   	if (!tun)
>>>>>>   		goto unlock;
>>>>>> +	else
>>>>>> +		ret = 0;
>>>>>>
>>>>>> -	tun_debug(KERN_INFO, tun, "tun_chr_ioctl cmd %d\n", cmd);
>>>>>> -
>>>>>> -	ret = 0;
>>>>>>   	switch (cmd) {
>>>>>>   	case TUNGETIFF:
>>>>>>   		ret = tun_get_iff(current->nsproxy->net_ns, tun,&ifr);
>>>>>> +		rcu_read_unlock();
>>>>>>   		if (ret)
>>>>>> -			break;
>>>>>> +			goto out;
>>>>>>
>>>>>>   		if (copy_to_user(argp,&ifr, ifreq_len))
>>>>>>   			ret = -EFAULT;
>>>>>> -		break;
>>>>>> +		goto out;
>>>>>>
>>>>>>   	case TUNSETNOCSUM:
>>>>>>   		/* Disable/Enable checksum */
>>>>>> @@ -1357,9 +1463,10 @@ static long __tun_chr_ioctl(struct file *file, unsigned int cmd,
>>>>>>   		/* Get hw address */
>>>>>>   		memcpy(ifr.ifr_hwaddr.sa_data, tun->dev->dev_addr, ETH_ALEN);
>>>>>>   		ifr.ifr_hwaddr.sa_family = tun->dev->type;
>>>>>> +		rcu_read_unlock();
>>>>>>   		if (copy_to_user(argp,&ifr, ifreq_len))
>>>>>>   			ret = -EFAULT;
>>>>>> -		break;
>>>>>> +		goto out;
>>>>>>
>>>>>>   	case SIOCSIFHWADDR:
>>>>>>   		/* Set hw address */
>>>>>> @@ -1375,9 +1482,9 @@ static long __tun_chr_ioctl(struct file *file, unsigned int cmd,
>>>>>>   	}
>>>>>>
>>>>>>   unlock:
>>>>>> +	rcu_read_unlock();
>>>>>> +out:
>>>>>>   	rtnl_unlock();
>>>>>> -	if (tun)
>>>>>> -		tun_put(tun);
>>>>>>   	return ret;
>>>>>>   }
>>>>>>
>>>>>> @@ -1517,6 +1624,11 @@ out:
>>>>>>   	return ret;
>>>>>>   }
>>>>>>
>>>>>> +static void tun_sock_destruct(struct sock *sk)
>>>>>> +{
>>>>>> +	skb_queue_purge(&sk->sk_receive_queue);
>>>>>> +}
>>>>>> +
>>>>>>   static int tun_chr_open(struct inode *inode, struct file * file)
>>>>>>   {
>>>>>>   	struct net *net = current->nsproxy->net_ns;
>>>>>> @@ -1540,6 +1652,7 @@ static int tun_chr_open(struct inode *inode, struct file * file)
>>>>>>   	sock_init_data(&tfile->socket,&tfile->sk);
>>>>>>
>>>>>>   	tfile->sk.sk_write_space = tun_sock_write_space;
>>>>>> +	tfile->sk.sk_destruct = tun_sock_destruct;
>>>>>>   	tfile->sk.sk_sndbuf = INT_MAX;
>>>>>>   	file->private_data = tfile;
>>>>>>
>>>>>> @@ -1549,31 +1662,8 @@ static int tun_chr_open(struct inode *inode, struct file * file)
>>>>>>   static int tun_chr_close(struct inode *inode, struct file *file)
>>>>>>   {
>>>>>>   	struct tun_file *tfile = file->private_data;
>>>>>> -	struct tun_struct *tun;
>>>>>> -
>>>>>> -	tun = __tun_get(tfile);
>>>>>> -	if (tun) {
>>>>>> -		struct net_device *dev = tun->dev;
>>>>>> -
>>>>>> -		tun_debug(KERN_INFO, tun, "tun_chr_close\n");
>>>>>> -
>>>>>> -		__tun_detach(tun);
>>>>>> -
>>>>>> -		/* If desirable, unregister the netdevice. */
>>>>>> -		if (!(tun->flags&    TUN_PERSIST)) {
>>>>>> -			rtnl_lock();
>>>>>> -			if (dev->reg_state == NETREG_REGISTERED)
>>>>>> -				unregister_netdevice(dev);
>>>>>> -			rtnl_unlock();
>>>>>> -		}
>>>>>>
>>>>>> -		/* drop the reference that netdevice holds */
>>>>>> -		sock_put(&tfile->sk);
>>>>>> -
>>>>>> -	}
>>>>>> -
>>>>>> -	/* drop the reference that file holds */
>>>>>> -	sock_put(&tfile->sk);
>>>>>> +	tun_detach(tfile, true);
>>>>>>
>>>>>>   	return 0;
>>>>>>   }
>>>>>> @@ -1700,14 +1790,17 @@ static void tun_cleanup(void)
>>>>>>    * holding a reference to the file for as long as the socket is in use. */
>>>>>>   struct socket *tun_get_socket(struct file *file)
>>>>>>   {
>>>>>> -	struct tun_struct *tun;
>>>>>> +	struct tun_struct *tun = NULL;
>>>>>>   	struct tun_file *tfile = file->private_data;
>>>>>>   	if (file->f_op !=&tun_fops)
>>>>>>   		return ERR_PTR(-EINVAL);
>>>>>> -	tun = tun_get(file);
>>>>>> -	if (!tun)
>>>>>> +	rcu_read_lock();
>>>>>> +	tun = rcu_dereference(tfile->tun);
>>>>>> +	if (!tun) {
>>>>>> +		rcu_read_unlock();
>>>>>>   		return ERR_PTR(-EBADFD);
>>>>>> -	tun_put(tun);
>>>>>> +	}
>>>>>> +	rcu_read_unlock();
>>>>>>   	return&tfile->socket;
>>>>>>   }
>>>>>>   EXPORT_SYMBOL_GPL(tun_get_socket);
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> Please read the FAQ at  http://www.tux.org/lkml/
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply

* linux-next: manual merge of the wireless-next tree with the net-next tree
From: Stephen Rothwell @ 2012-06-28  2:40 UTC (permalink / raw)
  To: John W. Linville
  Cc: linux-next, linux-kernel, Joe Perches, Franky Lin, David Miller,
	netdev

[-- Attachment #1: Type: text/plain, Size: 488 bytes --]

Hi John,

Today's linux-next merge of the wireless-next tree got a conflict in
drivers/net/wireless/brcm80211/brcmfmac/dhd_sdio.c between commit
2c208890c6d4 ("wireless: Remove casts to same type") from the net-next
tree and commit d610cde30b00 ("brcmfmac: use firmware data buffer
directly for nvram") from the wireless-next tree.

The latter removed the code modified by the former, so I used the latter.
-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: [PATCH] can: flexcan: use be32_to_cpup to handle the value of dt entry
From: Hui Wang @ 2012-06-28  1:54 UTC (permalink / raw)
  To: Marc Kleine-Budde; +Cc: Shawn Guo, davem, netdev, linux-can, Hui Wang
In-Reply-To: <4FEAEFC1.4060104@pengutronix.de>

Marc Kleine-Budde wrote:
> On 06/27/2012 01:26 PM, Shawn Guo wrote:
>   
>> On 27 June 2012 17:27, Marc Kleine-Budde <mkl@pengutronix.de> wrote:
>>     
>>> From: Hui Wang <jason77.wang@gmail.com>
>>>
>>> The freescale arm i.MX series platform can support this driver, and
>>> usually the arm cpu works in the little endian mode by default, while
>>> device tree entry value is stored in big endian format, we should use
>>> be32_to_cpup() to handle them, after modification, it can work well
>>> both on the le cpu and be cpu.
>>>
>>>       
>> I'm wondering if you want to just use of_property_read_u32() to make
>> it a little bit easier.
>>     
>
> Even better. Hui can you send a updated patch.
>   
OK.

Regards,
Hui.
> Marc
>
>   


^ permalink raw reply

* Re: [net-next PATCH 02/02] net/ipv4: VTI support new module for ip_vti.
From: David Miller @ 2012-06-28  1:19 UTC (permalink / raw)
  To: saurabh.mohan; +Cc: netdev
In-Reply-To: <20120628010218.GA4056@debian-saurabh-64.vyatta.com>

From: Saurabh <saurabh.mohan@vyatta.com>
Date: Wed, 27 Jun 2012 18:02:18 -0700

> +static int vti_err(struct sk_buff *skb, u32 info)

In net-next, individual ICMP error handlers must explicitly
handle PMTU messages.

You're does not.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox