Netdev List
 help / color / mirror / Atom feed
* [PATCH 1/3] be2net: reduce gso_max_size setting to account for ethernet header.
From: sarveshwar.bandi @ 2012-06-14  5:51 UTC (permalink / raw)
  To: davem; +Cc: netdev, Sarveshwar Bandi
In-Reply-To: <1339653105-7313-1-git-send-email-sarveshwar.bandi@emulex.com>

From: Sarveshwar Bandi <sarveshwar.bandi@emulex.com>

The maximum size of packet that can be handled by controller including ethernet
header is 65535. Reducing gso_max_size accordingly.

Signed-off-by: Sarveshwar Bandi <sarveshwar.bandi@emulex.com>
---
 drivers/net/ethernet/emulex/benet/be_main.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c
index 5a34503..cbd245a 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -3242,7 +3242,7 @@ static void be_netdev_init(struct net_device *netdev)
 
 	netdev->flags |= IFF_MULTICAST;
 
-	netif_set_gso_max_size(netdev, 65535);
+	netif_set_gso_max_size(netdev, 65535 - ETH_HLEN);
 
 	netdev->netdev_ops = &be_netdev_ops;
 
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH 3/3] be2net: Increase statistics structure size for skyhawk.
From: sarveshwar.bandi @ 2012-06-14  5:51 UTC (permalink / raw)
  To: davem; +Cc: netdev, Vasundhara Volam
In-Reply-To: <1339653105-7313-1-git-send-email-sarveshwar.bandi@emulex.com>

From: Vasundhara Volam <vasundhara.volam@emulex.com>

Increasing the hardware statistics structure to accomodate statistics for skyhawk.

Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com>
---
 drivers/net/ethernet/emulex/benet/be_cmds.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/emulex/benet/be_cmds.h b/drivers/net/ethernet/emulex/benet/be_cmds.h
index 2f6bb06..3c938f5 100644
--- a/drivers/net/ethernet/emulex/benet/be_cmds.h
+++ b/drivers/net/ethernet/emulex/benet/be_cmds.h
@@ -1566,7 +1566,7 @@ struct be_hw_stats_v1 {
 	u32 rsvd0[BE_TXP_SW_SZ];
 	struct be_erx_stats_v1 erx;
 	struct be_pmem_stats pmem;
-	u32 rsvd1[3];
+	u32 rsvd1[18];
 };
 
 struct be_cmd_req_get_stats_v1 {
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH 0/3] be2net: patches fix max gso size, modify driver log info and stats struct.
From: sarveshwar.bandi @ 2012-06-14  5:51 UTC (permalink / raw)
  To: davem; +Cc: netdev, Sarveshwar Bandi

From: Sarveshwar Bandi <sarveshwar.bandi@emulex.com>

Please apply these patches. Thanks.

Sarveshwar Bandi (1):
  be2net: reduce gso_max_size setting to account for ethernet header.

Vasundhara Volam (2):
  be2net: Modify error message to incorporate subsystem
  be2net: Increase statistics structure size for skyhawk.

 drivers/net/ethernet/emulex/benet/be_cmds.c |   12 ++++++------
 drivers/net/ethernet/emulex/benet/be_cmds.h |    2 +-
 drivers/net/ethernet/emulex/benet/be_main.c |    2 +-
 3 files changed, 8 insertions(+), 8 deletions(-)

-- 
1.7.9.5

^ permalink raw reply

* [PATCH 2/3] be2net: Modify error message to incorporate subsystem
From: sarveshwar.bandi @ 2012-06-14  5:51 UTC (permalink / raw)
  To: davem; +Cc: netdev, Vasundhara Volam
In-Reply-To: <1339653105-7313-1-git-send-email-sarveshwar.bandi@emulex.com>

From: Vasundhara Volam <vasundhara.volam@emulex.com>

Modify IOCTL error message to print subsystem also.

Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com>
---
 drivers/net/ethernet/emulex/benet/be_cmds.c |   12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/emulex/benet/be_cmds.c b/drivers/net/ethernet/emulex/benet/be_cmds.c
index f899752..5eab791 100644
--- a/drivers/net/ethernet/emulex/benet/be_cmds.c
+++ b/drivers/net/ethernet/emulex/benet/be_cmds.c
@@ -122,15 +122,15 @@ static int be_mcc_compl_process(struct be_adapter *adapter,
 			goto done;
 
 		if (compl_status == MCC_STATUS_UNAUTHORIZED_REQUEST) {
-			dev_warn(&adapter->pdev->dev, "This domain(VM) is not "
-				"permitted to execute this cmd (opcode %d)\n",
-				opcode);
+			dev_warn(&adapter->pdev->dev,
+				 "opcode %d-%d is not permitted\n",
+				 opcode, subsystem);
 		} else {
 			extd_status = (compl->status >> CQE_STATUS_EXTD_SHIFT) &
 					CQE_STATUS_EXTD_MASK;
-			dev_err(&adapter->pdev->dev, "Cmd (opcode %d) failed:"
-				"status %d, extd-status %d\n",
-				opcode, compl_status, extd_status);
+			dev_err(&adapter->pdev->dev,
+				"opcode %d-%d failed:status %d-%d\n",
+				opcode, subsystem, compl_status, extd_status);
 		}
 	}
 done:
-- 
1.7.9.5

^ permalink raw reply related

* Re: [PATCH 2/5] ipv4: Kill ip_rt_frag_needed().
From: Steffen Klassert @ 2012-06-14  5:58 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20120613.224203.297717896085583687.davem@davemloft.net>

On Wed, Jun 13, 2012 at 10:42:03PM -0700, David Miller wrote:
> 
> Ok, then if we want to do the fragmentation locally then we have to
> consider my initial patch which updates the PMTU in raw_err().
> 
> Did you test that?  I mean specifically, this patch:
> 
> http://marc.info/?l=linux-netdev&m=133945597319917&w=2
> 
> If it works for you, I will try to extend it to the other datagram
> cases.
> 

Yes, I can confirm that this one works. It restores the old behaviour.

^ permalink raw reply

* Re: [PATCH 2/5] ipv4: Kill ip_rt_frag_needed().
From: David Miller @ 2012-06-14  5:59 UTC (permalink / raw)
  To: steffen.klassert; +Cc: netdev
In-Reply-To: <20120613.224203.297717896085583687.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Wed, 13 Jun 2012 22:42:03 -0700 (PDT)

> From: Steffen Klassert <steffen.klassert@secunet.com>
> Date: Thu, 14 Jun 2012 07:35:29 +0200
> 
>> With your patch applied, we stop setting the DF bit after we
>> received a 'need to frag' ICMP message, but we don't fragment. We
>> send the packets out unfragmented. Before we removed
>> ip_rt_frag_needed(), we did the fragmentation according to the pmtu
>> informations we got from the icmp message. Now the router with the
>> low mtu has to do the fragmentation.
> 
> Ok, then if we want to do the fragmentation locally then we have to
> consider my initial patch which updates the PMTU in raw_err().
> 
> Did you test that?  I mean specifically, this patch:
> 
> http://marc.info/?l=linux-netdev&m=133945597319917&w=2
> 
> If it works for you, I will try to extend it to the other datagram
> cases.

Actually, thinking some more, we could extend my inet->pmtudisc patch
to achieve a similar effect.

Essentially we'd have a socket local PMTU value for datagram sockets.

Would you be OK with that approach?

I like the inet->pmtudisc way of solving this problem, because it:

1) Requires no special code to "remember" the flow used for the last
   socket sendmsg() call.

2) In the events of a malicious attempt to poison the routing cache
   PMTU information, only one socket will be harmed, rather than
   the whole system.

I tried to look for inspiration in other systems, but all of them lack
source based routing and other things we support, so they just use
a purely destination address based cache for PMTU information.

Other systems also don't have to deal with SO_BINDTODEVICE which
influences the route.

So we absolutely have to make our PMTU operations with the full
context used to emit the packet.

^ permalink raw reply

* [GIT] Networking
From: David Miller @ 2012-06-14  6:00 UTC (permalink / raw)
  To: torvalds; +Cc: akpm, netdev, linux-kernel


Hey Linus, this has the fixes for the wireless issues you ran into the
other week as well as:

1) Fix CAN c_can driver transmit handling resulting in BUG check triggers,
   from AnilKumar Ch.

2) Fix packet drop monitor sleeping in atomic context, from Eric Dumazet.

3) Fix mv643xx_eth driver build regression, from Andrew Lunn.

4) Inetpeer freeing needs an RCU grace period in order to avoid races
   during tree invalidation.  From Eric Dumazet.

5) Fix endianness bugs in xt_HMARK netfilter module, from Hans Schillstrom.

6) Add proper module refcounting to l2tp_eth to avoid crash on module
   unload, from Eric Dumazet.

7) Fix truncation of neighbour entry dumps due to logic errors in
   neigh_dump_info() and friends, from Eric Dumazet.

8) The conversion of fib6_age() to dst_neigh_lookup() accidently
   reversed the logic of a flags test, fix from Thomas Graf.

9) Fix checksum configuration in newer sky2 chips, from Stephen
   Hemminger.

10) Revert BQL support in NIU driver, doesn't work.

11) l2tp_ip_sendmsg() illegally uses a route without a proper reference.
    From Eric Dumazet.

12) be2net driver references an SKB after it's potentially been freed,
    also from Eric Dumazet.

13) Fix RCU stalls in dummy net driver init.  Also from Eric Dumazet.

14) lpc_eth has several bugs in it's transmit engine leading to packet
    leaks and improper queue wakes, from Eric Dumazet.

15) Apply short DMA workaround to more tg3 chips, from Matt Carlson.

16) Add tilegx network driver.

17) Bonding queue mapping for a packet can get corrupted, fix from
    Eric Dumazet.

18) Fix bug in netpoll_send_udp() SKB management that can leave garbage
    in the payload in certain situations.  From Eric Dumazet.

19) bnx2x driver interprets chip RX checksum offload incorrectly in
    encapsulation situations.  Fix from Eric Dumazet.

Please pull, thanks a lot!

The following changes since commit f8f5701bdaf9134b1f90e5044a82c66324d2073f:

  Linux 3.5-rc1 (2012-06-02 18:29:26 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git master

for you to fetch changes up to d6cb3e41386f20fb0777d0b59a2def82c65d37f7:

  bnx2x: fix checksum validation (2012-06-13 15:58:53 -0700)

----------------------------------------------------------------
Amitkumar Karwar (1):
      cfg80211: use sme_state in ibss start/join path

Andrew Lunn (1):
      mv643xx_eth: Fix compile error for architectures without clk.

AnilKumar Ch (3):
      can: c_can: fix "BUG! echo_skb is occupied!" during transmit
      can: c_can: fix an interrupt thrash issue with c_can driver
      can: c_can: fix race condition in c_can_open()

Arik Nemtsov (1):
      mac80211: fix non RCU-safe sta_list manipulation

Avinash Patil (2):
      mwifiex: invalidate bss config before setting channel for uAP
      mwifiex: support NL80211_HIDDEN_SSID_ZERO_LEN for uAP

Bjørn Mork (1):
      net: sierra_net: device IDs for Aircard 320U++

Bruce Allan (1):
      e1000e: test for valid check_reset_block function pointer

Chris Metcalf (1):
      tilegx network driver: initial support

Chun-Yeow Yeoh (1):
      mac80211: Fix Unreachable Mesh Station Problem when joining to another MBSS

Dave Jones (1):
      appletalk: Remove out of date message in printk

David S. Miller (4):
      Merge branch 'master' of git://gitorious.org/linux-can/linux-can
      Merge branch 'for-davem' of git://git.kernel.org/.../linville/wireless
      Revert "niu: Add support for byte queue limits."
      Merge branch 'master' of git://1984.lsi.us.es/net

Emmanuel Grumbach (2):
      iwlwifi: don't mess up the SCD when removing a key
      iwlwifi: disable the buggy chain extension feature in HW

Eric Dumazet (13):
      drop_monitor: dont sleep in atomic context
      inetpeer: fix a race in inetpeer_gc_worker()
      net: l2tp_eth: fix kernel panic on rmmod l2tp_eth
      net: neighbour: fix neigh_dump_info()
      l2tp: fix a race in l2tp_ip_sendmsg()
      be2net: fix a race in be_xmit()
      virtio-net: fix a race on 32bit arches
      dummy: fix rcu_sched self-detected stalls
      lpc_eth: add missing ndo_change_mtu()
      lpc_eth: fix tx completion
      bonding: Fix corrupted queue_mapping
      netpoll: fix netpoll_send_udp() bugs
      bnx2x: fix checksum validation

Felix Fietkau (1):
      mac80211: add missing rcu_read_lock/unlock in agg-rx session timer

Giuseppe CAVALLARO (5):
      net: icplus: fix interrupt mask
      stmmac: fix driver's doc when run kernel-doc script
      stmmac: update driver's doc
      stmmac: fix driver Kconfig when built as module
      stmmac: fix driver built w/ w/o both pci and platf modules

Haiyang Zhang (1):
      net/hyperv: Use wait_event on outstanding sends during device removal

Hans Schillstrom (1):
      netfilter: xt_HMARK: fix endianness and provide consistent hashing

Hauke Mehrtens (1):
      bcma: fix null pointer in bcma_core_pci_irq_ctl

Joe Perches (3):
      can: cc770: Fix likely misuse of | for &
      mac80211: Fix likely misuse of | for &
      brcmfmac: Fix likely misuse of | for &

Johannes Berg (8):
      iwlwifi: fix TX power antenna access
      mac80211_hwsim: advertise interface combinations
      mac80211: clean up remain-on-channel on interface stop
      iwlwifi: disable WoWLAN if !CONFIG_PM_SLEEP
      iwlwifi: fix double free/complete in firmware loading
      iwlwifi: unregister LEDs if mac80211 registration fails
      cfg80211: fix interface combinations check
      wireless: add my new trees to MAINTAINERS

John Fastabend (2):
      ixgbe: fix_features rxvlan is independent of DCB and needs to be set
      ixgbe: IXGBE_RXD_STAT_VP set even with Rx stripping enabled

John W. Linville (3):
      Merge branch 'master' of git://git.kernel.org/.../bluetooth/bluetooth
      Merge branch 'master' of git://git.kernel.org/.../linville/wireless into for-davem
      Merge branch 'master' of git://git.kernel.org/.../linville/wireless into for-davem

Matt Carlson (1):
      tg3: Apply short DMA frag workaround to 5906

Meenakshi Venkataraman (2):
      mac80211: fix error in station state transitions during reconfig
      iwlwifi: use correct supported firmware for 6035 and 6000g2

Oleksij Rempel (1):
      b43: do not call ieee80211_unregister_hw if we are not registred

Pablo Neira Ayuso (1):
      netfilter: nf_ct_h323: fix bug in rtcp natting

Paul Pluzhnikov (1):
      net: Make linux/tcp.h C++ friendly (trivial)

Qasim Javed (1):
      mac80211_hwsim: Set IEEE80211_STAT_ACK flag when userspace indicates that the frame has been acknowledged.

Randy Dunlap (2):
      net/core: fix kernel-doc warnings
      netdev: fix drivers/net/phy/ kernel-doc warnings

Roland Dreier (1):
      net: Reorder initialization in ip_route_output to fix gcc warning

Sasha Levin (1):
      NFC: Fix possible NULL ptr deref when getting the name of a socket

Seth Forshee (1):
      bcma: add ext PA workaround for BCM4331 and BCM43431

Stanislav Yakovlev (1):
      net/wireless: ipw2100: Fix WARN_ON occurring in wiphy_register called by ipw2100_pci_init_one

Stanislaw Gruszka (4):
      mac80211: run scan after finish connection monitoring
      rt2x00: use atomic variable for seqno
      rtl8187: ->brightness_set can not sleep
      mac80211: add back channel change flag

Stefan Roese (1):
      net: stmmac: Fix clock en-/disable calls

Thomas Graf (1):
      ipv6: fib: Restore NTF_ROUTER exception in fib6_age()

Vincent Bernat (1):
      snmp: fix OutOctets counter to include forwarded datagrams

Vinicius Costa Gomes (1):
      Bluetooth: Fix checking the wrong flag when accepting a socket

Weiping Pan (1):
      bonding:record primary when modify it via sysfs

françois romieu (1):
      r8169: avoid NAPI scheduling delay.

stephen hemminger (1):
      sky2: fix checksum bit management on some chips

 Documentation/networking/stmmac.txt                   |   44 +-
 MAINTAINERS                                           |   12 +-
 drivers/bcma/driver_chipcommon_pmu.c                  |    4 +-
 drivers/bcma/driver_pci.c                             |    6 +-
 drivers/bcma/sprom.c                                  |    4 +-
 drivers/net/bonding/bond_main.c                       |    9 +-
 drivers/net/bonding/bond_sysfs.c                      |    8 +-
 drivers/net/can/c_can/c_can.c                         |   16 +-
 drivers/net/can/c_can/c_can.h                         |    1 +
 drivers/net/can/cc770/cc770_platform.c                |    2 +-
 drivers/net/dummy.c                                   |    4 +-
 drivers/net/ethernet/broadcom/bnx2x/bnx2x.h           |   15 -
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c       |   27 +-
 drivers/net/ethernet/broadcom/tg3.c                   |    3 +-
 drivers/net/ethernet/emulex/benet/be_main.c           |    5 +-
 drivers/net/ethernet/intel/e1000e/ethtool.c           |    6 +-
 drivers/net/ethernet/intel/e1000e/mac.c               |    2 +-
 drivers/net/ethernet/intel/e1000e/netdev.c            |    4 +-
 drivers/net/ethernet/intel/e1000e/phy.c               |    8 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c         |   22 +-
 drivers/net/ethernet/marvell/mv643xx_eth.c            |   15 +-
 drivers/net/ethernet/marvell/sky2.c                   |   10 +-
 drivers/net/ethernet/nxp/lpc_eth.c                    |   11 +-
 drivers/net/ethernet/realtek/r8169.c                  |    6 +-
 drivers/net/ethernet/stmicro/stmmac/Kconfig           |    4 +-
 drivers/net/ethernet/stmicro/stmmac/stmmac.h          |   63 +-
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c     |   35 +-
 drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c      |   29 +-
 drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c |    4 +-
 drivers/net/ethernet/sun/niu.c                        |   12 +-
 drivers/net/ethernet/tile/Kconfig                     |    2 +
 drivers/net/ethernet/tile/Makefile                    |    4 +-
 drivers/net/ethernet/tile/tilegx.c                    | 1898 +++++++++++++++++++++++++++++++++++++
 drivers/net/hyperv/hyperv_net.h                       |    1 +
 drivers/net/hyperv/netvsc.c                           |   12 +-
 drivers/net/phy/icplus.c                              |    7 +
 drivers/net/phy/mdio_bus.c                            |    2 +-
 drivers/net/usb/sierra_net.c                          |   14 +-
 drivers/net/virtio_net.c                              |   19 +-
 drivers/net/wireless/b43/b43.h                        |    4 +
 drivers/net/wireless/b43/main.c                       |   19 +-
 drivers/net/wireless/brcm80211/brcmfmac/bcmsdh.c      |    4 +-
 drivers/net/wireless/ipw2x00/ipw2100.c                |   20 +-
 drivers/net/wireless/iwlwifi/iwl-6000.c               |   23 +-
 drivers/net/wireless/iwlwifi/iwl-agn-sta.c            |    2 +-
 drivers/net/wireless/iwlwifi/iwl-drv.c                |    9 +-
 drivers/net/wireless/iwlwifi/iwl-eeprom.c             |   18 +-
 drivers/net/wireless/iwlwifi/iwl-mac80211.c           |    3 +
 drivers/net/wireless/iwlwifi/iwl-prph.h               |    1 +
 drivers/net/wireless/iwlwifi/iwl-trans-pcie.c         |    5 +
 drivers/net/wireless/mac80211_hwsim.c                 |   22 +
 drivers/net/wireless/mwifiex/cfg80211.c               |   13 +
 drivers/net/wireless/mwifiex/fw.h                     |    6 +
 drivers/net/wireless/mwifiex/uap_cmd.c                |   10 +
 drivers/net/wireless/rt2x00/rt2x00.h                  |    3 +-
 drivers/net/wireless/rt2x00/rt2x00mac.c               |    1 -
 drivers/net/wireless/rt2x00/rt2x00queue.c             |   13 +-
 drivers/net/wireless/rtl818x/rtl8187/leds.c           |    2 +-
 include/linux/netfilter/xt_HMARK.h                    |    5 +
 include/linux/tcp.h                                   |   20 +-
 include/net/inetpeer.h                                |    5 +-
 include/net/route.h                                   |    2 +-
 include/net/sch_generic.h                             |    7 +-
 net/appletalk/ddp.c                                   |    4 +-
 net/bluetooth/af_bluetooth.c                          |    2 +-
 net/core/drop_monitor.c                               |  102 +-
 net/core/filter.c                                     |    4 +-
 net/core/neighbour.c                                  |   14 +-
 net/core/netpoll.c                                    |   11 +-
 net/core/skbuff.c                                     |    2 +-
 net/ipv4/inetpeer.c                                   |   16 +-
 net/ipv4/ip_forward.c                                 |    1 +
 net/ipv4/ipmr.c                                       |    1 +
 net/ipv6/ip6_fib.c                                    |    2 +-
 net/ipv6/ip6_output.c                                 |    1 +
 net/ipv6/ip6mr.c                                      |    2 +
 net/l2tp/l2tp_eth.c                                   |    2 +
 net/l2tp/l2tp_ip.c                                    |    9 +-
 net/mac80211/agg-rx.c                                 |    7 +-
 net/mac80211/cfg.c                                    |    6 +-
 net/mac80211/iface.c                                  |   12 +
 net/mac80211/mlme.c                                   |   38 +-
 net/mac80211/offchannel.c                             |   16 +
 net/mac80211/sta_info.c                               |    4 +-
 net/mac80211/tx.c                                     |    9 +-
 net/mac80211/util.c                                   |    2 +-
 net/netfilter/nf_conntrack_h323_main.c                |    5 +-
 net/netfilter/xt_HMARK.c                              |   72 +-
 net/nfc/llcp/sock.c                                   |    3 +
 net/wireless/ibss.c                                   |    6 +-
 net/wireless/util.c                                   |   19 +-
 91 files changed, 2550 insertions(+), 389 deletions(-)
 create mode 100644 drivers/net/ethernet/tile/tilegx.c

^ permalink raw reply

* Re: PPPoE performance regression
From: Paul Mackerras @ 2012-06-14  6:18 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Nathan Williams, Karl Hiramoto, David S. Miller, netdev,
	John Crispin
In-Reply-To: <1339595401.11011.48.camel@shinybook.infradead.org>

On Wed, Jun 13, 2012 at 02:50:01PM +0100, David Woodhouse wrote:
> On Wed, 2012-06-13 at 10:57 +0100, David Woodhouse wrote:
> > This doesn't look *so* evil... if the basic concept of using
> > skb_orphan() and then setting our own destructor is OK, then I'll work
> > out the rest of the details and do it for l2tp too.
> 
> Stupid dwmw2. With patch this time...

> +static void pppoe_skb_destructor(struct sk_buff *skb)
> +{
> +	struct sock *sk = skb->sk;
> +	struct pppox_sock *po = pppox_sk(sk);
> +
> +	atomic_dec(&po->inflight);
> +	/* Schedule a call to ppp_output_wakeup(chan), if it was already blocked.
> +	   Mind for race conditions with another CPU which is in pppoe_xmit() 
> +	   right now. See commit 9d02daf7 in pppoatm. */
> +	sock_put(sk);
> +}

Umm, how does ppp_output_wakeup() actually get called?

Paul.

^ permalink raw reply

* Re: [PATCH 2/5] ipv4: Kill ip_rt_frag_needed().
From: Steffen Klassert @ 2012-06-14  6:36 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20120613.225941.2175393318277942399.davem@davemloft.net>

On Wed, Jun 13, 2012 at 10:59:41PM -0700, David Miller wrote:
> 
> Actually, thinking some more, we could extend my inet->pmtudisc patch
> to achieve a similar effect.
> 
> Essentially we'd have a socket local PMTU value for datagram sockets.
> 
> Would you be OK with that approach?

This would require to maintain socket local pmtu expire times too.
Also, which of these pmtu values do we report if a user asks for
that? And how should we flush all these pmtu values?

It could have some side effects. But if we get it to work,
it would be an improvement. I'm fine with everything
that works in the end :-)

^ permalink raw reply

* Re: PPPoE performance regression
From: David Woodhouse @ 2012-06-14  6:49 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Nathan Williams, Karl Hiramoto, David S. Miller, netdev,
	John Crispin
In-Reply-To: <20120614061809.GA10453@drongo>

[-- Attachment #1: Type: text/plain, Size: 485 bytes --]

On Thu, 2012-06-14 at 16:18 +1000, Paul Mackerras wrote:
> Umm, how does ppp_output_wakeup() actually get called? 

Not directly from the destructor; it'll need to be from a tasklet like
the PPPoATM code does it. And the same race conditions, and the same
handling of them which makes me slightly uneasy as it depends quite
intimately on ppp_generic's internal locking scheme, will apply.

That's what I meant when I said I'd work out the rest of the details...

-- 
dwmw2

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 6171 bytes --]

^ permalink raw reply

* Re: [PATCH 2/5] ipv4: Kill ip_rt_frag_needed().
From: David Miller @ 2012-06-14  6:54 UTC (permalink / raw)
  To: steffen.klassert; +Cc: netdev
In-Reply-To: <20120614063632.GS27795@secunet.com>

From: Steffen Klassert <steffen.klassert@secunet.com>
Date: Thu, 14 Jun 2012 08:36:32 +0200

> On Wed, Jun 13, 2012 at 10:59:41PM -0700, David Miller wrote:
>> 
>> Actually, thinking some more, we could extend my inet->pmtudisc patch
>> to achieve a similar effect.
>> 
>> Essentially we'd have a socket local PMTU value for datagram sockets.
>> 
>> Would you be OK with that approach?
> 
> This would require to maintain socket local pmtu expire times too.
> Also, which of these pmtu values do we report if a user asks for
> that? And how should we flush all these pmtu values?
> 
> It could have some side effects. But if we get it to work,
> it would be an improvement. I'm fine with everything
> that works in the end :-)

Your right, maybe the route updating approach is therefore better.

I'll play around with my original patch.

Thanks.

^ permalink raw reply

* Re: [PATCH 0/8] dcbnl: Major simplifications
From: Thomas Graf @ 2012-06-14  7:54 UTC (permalink / raw)
  To: David Miller; +Cc: tgraf, netdev, lucy.liu, john.r.fastabend, alexander.h.duyck
In-Reply-To: <20120613.155541.26513304496438357.davem@davemloft.net>

On Wed, Jun 13, 2012 at 03:55:41PM -0700, David Miller wrote:
> Lots of deleted code, I like it :-)
> 
> Applied, but could you send a follow-on patch to use BUG_ON() instead
> of that "if (!ptr) { /* ... */ BUG(); }" construct?

Sure, I must have had a weak moment right there :)

^ permalink raw reply

* RE: am3359 (cpsw): unable to get gigabit link working (PHY: ICPlus IP1001)
From: N, Mugunthan V @ 2012-06-14  8:16 UTC (permalink / raw)
  To: Yegor Yefremov, netdev; +Cc: Chemparathy, Cyril, Govindarajan, Sriramakrishnan
In-Reply-To: <CAGm1_kurEedQc=5dR-mWPingRZcG9s0y9VDY=UijQC1X9RhJCw@mail.gmail.com>

> -----Original Message-----
> From: Yegor Yefremov [mailto:yegorslists@googlemail.com]
> Sent: Wednesday, June 13, 2012 5:59 PM
> To: netdev
> Cc: Chemparathy, Cyril; Govindarajan, Sriramakrishnan; N, Mugunthan V
> Subject: am3359 (cpsw): unable to get gigabit link working (PHY: ICPlus
> IP1001)
> 
> I'm using a custom am3359 based board. The first CPSW ports is
> connected to ICPlus IP1001 phy via RGMII interface. I have no problem
> with 100mbit link, but as soon as I connect my board to a 1Gbit switch
> the communication fails. I can see that some data goes on wire, but
> switch refuses to forward it. I can't receive anything either. Phy
>. detects 1Gbit link correctly
> 
> My kernel: https://github.com/koenkooi/linux
> 
> I also define phy interface as RGMII:
> 
> static struct cpsw_slave_data am33xx_cpsw_slaves[] = {
>         {
>                 .slave_reg_ofs  = 0x208,
>                 .sliver_reg_ofs = 0xd80,
>                 .phy_id         = "0:02",
>                 .phy_if               = PHY_INTERFACE_MODE_RGMII,
>         },
>         {
>                 .slave_reg_ofs  = 0x308,
>                 .sliver_reg_ofs = 0xdc0,
>                 .phy_id         = "0:01",
>         },
> };
> 
> As far as I can see in the mainline cpsw driver
> PHY_INTERFACE_MODE_RGMII will be ignored. Why? In the current driver
> it sets two bits in mac_control register:
> 
> if (phy->interface == PHY_INTERFACE_MODE_RGMII) /* RGMII */
>                         mac_control |= (BIT(15)|BIT(16));
> 
> Any idea where should I look to solve the problem?

Phy interface in cpsw is always MII, there is an external gasket to convert from MII/GMII to RMII/RGMII. So this is a dead code resides in current driver and will be removed, that's why this part of code is not up streamed.

Can you verify the design/board layout with hardware engineer; also look for cpsw statistics counters for errors.

Regards,
Mugunthan V N. 

^ permalink raw reply

* Re: am3359 (cpsw): unable to get gigabit link working (PHY: ICPlus IP1001)
From: Yegor Yefremov @ 2012-06-14  8:28 UTC (permalink / raw)
  To: N, Mugunthan V; +Cc: netdev, Chemparathy, Cyril, Govindarajan, Sriramakrishnan
In-Reply-To: <EB1619762EAF8B4E97A227FB77B7E0293E9B6D3B@DBDE01.ent.ti.com>

On Thu, Jun 14, 2012 at 10:16 AM, N, Mugunthan V <mugunthanvnm@ti.com> wrote:
>> -----Original Message-----
>> From: Yegor Yefremov [mailto:yegorslists@googlemail.com]
>> Sent: Wednesday, June 13, 2012 5:59 PM
>> To: netdev
>> Cc: Chemparathy, Cyril; Govindarajan, Sriramakrishnan; N, Mugunthan V
>> Subject: am3359 (cpsw): unable to get gigabit link working (PHY: ICPlus
>> IP1001)
>>
>> I'm using a custom am3359 based board. The first CPSW ports is
>> connected to ICPlus IP1001 phy via RGMII interface. I have no problem
>> with 100mbit link, but as soon as I connect my board to a 1Gbit switch
>> the communication fails. I can see that some data goes on wire, but
>> switch refuses to forward it. I can't receive anything either. Phy
>>. detects 1Gbit link correctly
>>
>> My kernel: https://github.com/koenkooi/linux
>>
>> I also define phy interface as RGMII:
>>
>> static struct cpsw_slave_data am33xx_cpsw_slaves[] = {
>>         {
>>                 .slave_reg_ofs  = 0x208,
>>                 .sliver_reg_ofs = 0xd80,
>>                 .phy_id         = "0:02",
>>                 .phy_if               = PHY_INTERFACE_MODE_RGMII,
>>         },
>>         {
>>                 .slave_reg_ofs  = 0x308,
>>                 .sliver_reg_ofs = 0xdc0,
>>                 .phy_id         = "0:01",
>>         },
>> };
>>
>> As far as I can see in the mainline cpsw driver
>> PHY_INTERFACE_MODE_RGMII will be ignored. Why? In the current driver
>> it sets two bits in mac_control register:
>>
>> if (phy->interface == PHY_INTERFACE_MODE_RGMII) /* RGMII */
>>                         mac_control |= (BIT(15)|BIT(16));
>>
>> Any idea where should I look to solve the problem?
>
> Phy interface in cpsw is always MII, there is an external gasket to convert from MII/GMII to RMII/RGMII. So this is a dead code resides in current driver and will be removed, that's why this part of code is not up streamed.
>
> Can you verify the design/board layout with hardware engineer; also look for cpsw statistics counters for errors.

Thanks for the quick reply.

I have just fixed the problem. In this thread
http://e2e.ti.com/support/dsp/sitara_arm174_microprocessors/f/791/p/172049/685105.aspx#685105
Paul suggests to disable internal delay. I followed his suggestion and
now it works.

#define RGMII_MODE_ENABLE       0x3A

 __raw_writel(RGMII_MODE_ENABLE,
                        AM33XX_CTRL_REGADDR(MAC_MII_SEL));

Yegor

^ permalink raw reply

* Re: include/net/netlink.h:497:41: warning: ‘reply_nlh’ may be used uninitialized in this function [-Wuninitialized]
From: Thomas Graf @ 2012-06-14  8:30 UTC (permalink / raw)
  To: wfg; +Cc: netdev, David S. Miller
In-Reply-To: <4fd9474c.wdaRvq8n+Xh/6wOA%wfg@linux.intel.com>

On Thu, Jun 14, 2012 at 10:07:08AM +0800, wfg@linux.intel.com wrote:
> FYI: there are new compile warnings show up in
> 
> tree:   git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master
> head:   0450243096de90ff51c3a6c605410c5e28d79f8d
> commit: 33a03aadb52fa05d28aba6d8f0c03c7b3b905897 [464/472] dcbnl: Prepare framework to shorten handling functions
> config: x86_64-randconfig-net3 (attached as .config)
> 
> All related error/warning messages are:
> 
> net/dcb/dcbnl.c: In function ‘dcb_doit’:
> include/net/netlink.h:497:41: warning: ‘reply_nlh’ may be used uninitialized in this function [-Wuninitialized]
> net/dcb/dcbnl.c:1975:19: note: ‘reply_nlh’ was declared here

gcc 4.6.3 on my system doesn't complain about this. I'm sending a patch.

^ permalink raw reply

* [PATCH] dcbnl: Silence harmless gcc warning about uninitialized reply_nlh
From: Thomas Graf @ 2012-06-14  8:34 UTC (permalink / raw)
  To: davem; +Cc: netdev, wfg
In-Reply-To: <4fd9474c.wdaRvq8n+Xh/6wOA%wfg@linux.intel.com>


Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
 net/dcb/dcbnl.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/dcb/dcbnl.c b/net/dcb/dcbnl.c
index 70bba3e..da6ee81 100644
--- a/net/dcb/dcbnl.c
+++ b/net/dcb/dcbnl.c
@@ -1664,7 +1664,7 @@ static int dcb_doit(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
 	u32 pid = skb ? NETLINK_CB(skb).pid : 0;
 	int ret = -EINVAL;
 	struct sk_buff *reply_skb;
-	struct nlmsghdr *reply_nlh;
+	struct nlmsghdr *reply_nlh = NULL;
 	const struct reply_func *fn;
 
 	if (!net_eq(net, &init_net))

^ permalink raw reply related

* [PATCH] net: remove my future former mail address
From: Rémi Denis-Courmont @ 2012-06-14  8:29 UTC (permalink / raw)
  To: netdev; +Cc: Rémi Denis-Courmont, Rémi Denis-Courmont, Sakari Ailus

From: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>

Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
Cc: Sakari Ailus <sakari.ailus@nokia.com>
---
 include/net/phonet/gprs.h |    2 +-
 net/caif/caif_dev.c       |    3 +--
 net/phonet/af_phonet.c    |    4 ++--
 net/phonet/datagram.c     |    4 ++--
 net/phonet/pep-gprs.c     |    2 +-
 net/phonet/pep.c          |    2 +-
 net/phonet/pn_dev.c       |    4 ++--
 net/phonet/pn_netlink.c   |    4 ++--
 net/phonet/socket.c       |    4 ++--
 net/phonet/sysctl.c       |    2 +-
 10 files changed, 15 insertions(+), 16 deletions(-)

diff --git a/include/net/phonet/gprs.h b/include/net/phonet/gprs.h
index 928daf5..bcd525e 100644
--- a/include/net/phonet/gprs.h
+++ b/include/net/phonet/gprs.h
@@ -5,7 +5,7 @@
  *
  * Copyright (C) 2008 Nokia Corporation.
  *
- * Author: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
+ * Author: Rémi Denis-Courmont
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License
diff --git a/net/caif/caif_dev.c b/net/caif/caif_dev.c
index aa6f716..554b312 100644
--- a/net/caif/caif_dev.c
+++ b/net/caif/caif_dev.c
@@ -4,8 +4,7 @@
  * Author:	Sjur Brendeland/sjur.brandeland@stericsson.com
  * License terms: GNU General Public License (GPL) version 2
  *
- * Borrowed heavily from file: pn_dev.c. Thanks to
- *  Remi Denis-Courmont <remi.denis-courmont@nokia.com>
+ * Borrowed heavily from file: pn_dev.c. Thanks to Remi Denis-Courmont
  *  and Sakari Ailus <sakari.ailus@nokia.com>
  */
 
diff --git a/net/phonet/af_phonet.c b/net/phonet/af_phonet.c
index 779ce4f..5a940db 100644
--- a/net/phonet/af_phonet.c
+++ b/net/phonet/af_phonet.c
@@ -5,8 +5,8 @@
  *
  * Copyright (C) 2008 Nokia Corporation.
  *
- * Contact: Remi Denis-Courmont <remi.denis-courmont@nokia.com>
- * Original author: Sakari Ailus <sakari.ailus@nokia.com>
+ * Authors: Sakari Ailus <sakari.ailus@nokia.com>
+ *          Rémi Denis-Courmont
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License
diff --git a/net/phonet/datagram.c b/net/phonet/datagram.c
index bf35b4e..12c30f3 100644
--- a/net/phonet/datagram.c
+++ b/net/phonet/datagram.c
@@ -5,8 +5,8 @@
  *
  * Copyright (C) 2008 Nokia Corporation.
  *
- * Contact: Remi Denis-Courmont <remi.denis-courmont@nokia.com>
- * Original author: Sakari Ailus <sakari.ailus@nokia.com>
+ * Authors: Sakari Ailus <sakari.ailus@nokia.com>
+ *          Rémi Denis-Courmont
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License
diff --git a/net/phonet/pep-gprs.c b/net/phonet/pep-gprs.c
index d012089..a2fba7e 100644
--- a/net/phonet/pep-gprs.c
+++ b/net/phonet/pep-gprs.c
@@ -5,7 +5,7 @@
  *
  * Copyright (C) 2008 Nokia Corporation.
  *
- * Author: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
+ * Author: Rémi Denis-Courmont
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License
diff --git a/net/phonet/pep.c b/net/phonet/pep.c
index 9dd4f92..576f22c 100644
--- a/net/phonet/pep.c
+++ b/net/phonet/pep.c
@@ -5,7 +5,7 @@
  *
  * Copyright (C) 2008 Nokia Corporation.
  *
- * Author: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
+ * Author: Rémi Denis-Courmont
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License
diff --git a/net/phonet/pn_dev.c b/net/phonet/pn_dev.c
index 36f75a9..5bf6341 100644
--- a/net/phonet/pn_dev.c
+++ b/net/phonet/pn_dev.c
@@ -5,8 +5,8 @@
  *
  * Copyright (C) 2008 Nokia Corporation.
  *
- * Contact: Remi Denis-Courmont <remi.denis-courmont@nokia.com>
- * Original author: Sakari Ailus <sakari.ailus@nokia.com>
+ * Authors: Sakari Ailus <sakari.ailus@nokia.com>
+ *          Rémi Denis-Courmont
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License
diff --git a/net/phonet/pn_netlink.c b/net/phonet/pn_netlink.c
index cfdf135..7dd762a 100644
--- a/net/phonet/pn_netlink.c
+++ b/net/phonet/pn_netlink.c
@@ -5,8 +5,8 @@
  *
  * Copyright (C) 2008 Nokia Corporation.
  *
- * Contact: Remi Denis-Courmont <remi.denis-courmont@nokia.com>
- * Original author: Sakari Ailus <sakari.ailus@nokia.com>
+ * Authors: Sakari Ailus <sakari.ailus@nokia.com>
+ *          Remi Denis-Courmont
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License
diff --git a/net/phonet/socket.c b/net/phonet/socket.c
index 89cfa9c..0acc943 100644
--- a/net/phonet/socket.c
+++ b/net/phonet/socket.c
@@ -5,8 +5,8 @@
  *
  * Copyright (C) 2008 Nokia Corporation.
  *
- * Contact: Remi Denis-Courmont <remi.denis-courmont@nokia.com>
- * Original author: Sakari Ailus <sakari.ailus@nokia.com>
+ * Authors: Sakari Ailus <sakari.ailus@nokia.com>
+ *          Rémi Denis-Courmont
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License
diff --git a/net/phonet/sysctl.c b/net/phonet/sysctl.c
index 696348f..d6bbbbd 100644
--- a/net/phonet/sysctl.c
+++ b/net/phonet/sysctl.c
@@ -5,7 +5,7 @@
  *
  * Copyright (C) 2008 Nokia Corporation.
  *
- * Contact: Remi Denis-Courmont <remi.denis-courmont@nokia.com>
+ * Author: Rémi Denis-Courmont
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH] dcbnl: Use BUG_ON() instead of BUG()
From: Thomas Graf @ 2012-06-14  8:40 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, lucy.liu, john.r.fastabend, alexander.h.duyck
In-Reply-To: <20120613.155541.26513304496438357.davem@davemloft.net>


Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
 net/dcb/dcbnl.c |    5 +----
 1 files changed, 1 insertions(+), 4 deletions(-)

diff --git a/net/dcb/dcbnl.c b/net/dcb/dcbnl.c
index da6ee81..0a36007 100644
--- a/net/dcb/dcbnl.c
+++ b/net/dcb/dcbnl.c
@@ -208,10 +208,7 @@ static struct sk_buff *dcbnl_newmsg(int type, u8 cmd, u32 port, u32 seq,
 		return NULL;
 
 	nlh = nlmsg_put(skb, port, seq, type, sizeof(*dcb), flags);
-	if (!nlh) {
-		/* header should always fit, allocation must be buggy */
-		BUG();
-	}
+	BUG_ON(!nlh);
 
 	dcb = nlmsg_data(nlh);
 	dcb->dcb_family = AF_UNSPEC;

^ permalink raw reply related

* Re: [PATCH] dcbnl: Silence harmless gcc warning about uninitialized reply_nlh
From: David Miller @ 2012-06-14  8:46 UTC (permalink / raw)
  To: tgraf; +Cc: netdev, wfg
In-Reply-To: <20120614083403.GB29738@canuck.infradead.org>

From: Thomas Graf <tgraf@suug.ch>
Date: Thu, 14 Jun 2012 04:34:03 -0400

> Signed-off-by: Thomas Graf <tgraf@suug.ch>

Applied.

^ permalink raw reply

* Re: [PATCH] dcbnl: Use BUG_ON() instead of BUG()
From: David Miller @ 2012-06-14  8:46 UTC (permalink / raw)
  To: tgraf; +Cc: netdev, lucy.liu, john.r.fastabend, alexander.h.duyck
In-Reply-To: <20120614084015.GC29738@canuck.infradead.org>

From: Thomas Graf <tgraf@suug.ch>
Date: Thu, 14 Jun 2012 04:40:15 -0400

> Signed-off-by: Thomas Graf <tgraf@suug.ch>

Applied.

^ permalink raw reply

* Regression on TX throughput when using bonding
From: Jean-Michel Hautbois @ 2012-06-14  8:58 UTC (permalink / raw)
  To: netdev; +Cc: Eric Dumazet

Hi all,

I have bisected a regression which concerns TX throughput when using bonding.
I tested only with 10Gbps cards, as it appears when bandwidth need is
over 1Gbps on my machine.
I send UDP multicast packets over bonding and observe the tc result.

When KO :
$>tc -s -d qdisc show dev eth1
qdisc pfifo_fast 0: root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1
1 1 1 1 1 1
 Sent 1106527591 bytes 273802 pkt (dropped 306419, overlimits 0 requeues 223)
 backlog 0b 0p requeues 223

Ok course, when OK, dropped is 0.
$>tc -s -d qdisc show dev eth1
qdisc pfifo_fast 0: root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1
1 1 1 1 1 1
 Sent 1648662087 bytes 408009 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0


Here is the incriminated commit:

fc6055a5ba31e2c14e36e8939f9bf2b6d586a7f5 is the first bad commit
commit fc6055a5ba31e2c14e36e8939f9bf2b6d586a7f5
Author: Eric Dumazet <eric.dumazet@gmail.com>
Date:   Fri Apr 16 12:18:22 2010 +0000

    net: Introduce skb_orphan_try()

    Transmitted skb might be attached to a socket and a destructor, for
    memory accounting purposes.

    Traditionally, this destructor is called at tx completion time, when skb
    is freed.

    When tx completion is performed by another cpu than the sender, this
    forces some cache lines to change ownership. XPS was an attempt to give
    tx completion to initial cpu.

    David idea is to call destructor right before giving skb to device (call
    to ndo_start_xmit()). Because device queues are usually small, orphaning
    skb before tx completion is not a big deal. Some drivers already do
    this, we could do it in upper level.

    There is one known exception to this early orphaning, called tx
    timestamping. It needs to keep a reference to socket until device can
    give a hardware or software timestamp.

    This patch adds a skb_orphan_try() helper, to centralize all exceptions
    to early orphaning in one spot, and use it in dev_hard_start_xmit().

    "tbench 16" results on a Nehalem machine (2 X5570  @ 2.93GHz)
    before: Throughput 4428.9 MB/sec 16 procs
    after: Throughput 4448.14 MB/sec 16 procs

    UDP should get even better results, its destructor being more complex,
    since SOCK_USE_WRITE_QUEUE is not set (four atomic ops instead of one)

    Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

JM

^ permalink raw reply

* Re: [net-next PATCH 02/02] net/ipv4: VTI support new module for ip_vti.
From: Steffen Klassert @ 2012-06-14  9:12 UTC (permalink / raw)
  To: Saurabh; +Cc: netdev
In-Reply-To: <20120608173253.GA11962@debian-saurabh-64.vyatta.com>

On Fri, Jun 08, 2012 at 10:32:53AM -0700, Saurabh wrote:
> 
> 
> New VTI tunnel kernel module, Kconfig and Makefile changes.

It is an interesting feature, do you plan for an IPv6
version too?

I made some comments on the code below.

Thanks.

> 
> Signed-off-by: Saurabh Mohan <saurabh.mohan@vyatta.com>
> Reviewed-by: Stephen Hemminger <shemminger@vyatta.com>
> 
> ---
> diff --git a/include/linux/if_tunnel.h b/include/linux/if_tunnel.h
> index 16b92d0..4b4ce17 100644
> --- a/include/linux/if_tunnel.h
> +++ b/include/linux/if_tunnel.h
> @@ -80,4 +80,15 @@ enum {
>  
>  #define IFLA_GRE_MAX	(__IFLA_GRE_MAX - 1)
>  
> +enum {
> +	IFLA_VTI_UNSPEC,
> +	IFLA_VTI_LINK,
> +	IFLA_VTI_IKEY,
> +	IFLA_VTI_OKEY,
> +	IFLA_VTI_LOCAL,
> +	IFLA_VTI_REMOTE,
> +	__IFLA_VTI_MAX,
> +};
> +
> +#define IFLA_VTI_MAX	(__IFLA_VTI_MAX - 1)
>  #endif /* _IF_TUNNEL_H_ */
> diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig
> index 20f1cb5..3a95308 100644
> --- a/net/ipv4/Kconfig
> +++ b/net/ipv4/Kconfig
> @@ -310,6 +310,20 @@ config SYN_COOKIES
>  
>  	  If unsure, say N.
>  
> +config NET_IPVTI
> +    tristate "Virtual (secure) IP: tunneling"
> +    select INET_TUNNEL

You added your register/deregister functions to net/ipv4/xfrm4_mode_tunnel.c,
so this should somehow select or depend on INET_XFRM_MODE_TUNNEL.

> +    ---help---
> +      Tunneling means encapsulating data of one protocol type within
> +      another protocol and sending it over a channel that understands the
> +      Pencapsulating protocol. This particular tunneling driver implements
> +      encapsulation of IP within IP-ESP. This can be used with xfrm to give
> +      the notion of a secure tunnel and then use routing protocol on top.
> +
> +      Saying Y to this option will produce one module ( = code which can
> +      be inserted in and removed from the running kernel whenever you
> +      want). Most people won't need this and can say N.
> +
>  config INET_AH
>  	tristate "IP: AH transformation"
>  	select XFRM_ALGO
> diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile
> index ff75d3b..3999ce9 100644
> --- a/net/ipv4/Makefile
> +++ b/net/ipv4/Makefile
> @@ -20,6 +20,7 @@ obj-$(CONFIG_IP_MROUTE) += ipmr.o
>  obj-$(CONFIG_NET_IPIP) += ipip.o
>  obj-$(CONFIG_NET_IPGRE_DEMUX) += gre.o
>  obj-$(CONFIG_NET_IPGRE) += ip_gre.o
> +obj-$(CONFIG_NET_IPVTI) += ip_vti.o
>  obj-$(CONFIG_SYN_COOKIES) += syncookies.o
>  obj-$(CONFIG_INET_AH) += ah4.o
>  obj-$(CONFIG_INET_ESP) += esp4.o
> diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c
> new file mode 100644
> index 0000000..3eaa47c
> --- /dev/null
> +++ b/net/ipv4/ip_vti.c
> @@ -0,0 +1,980 @@
> +/*
> + *	Linux NET3:	IP/IP protocol decoder modified to support virtual tunnel interface
> + *
> + *	Authors:
> + *		Sam Lantinga (slouken@cs.ucdavis.edu)  02/01/95
> + *		Saurabh Mohan (saurabh.mohan@vyatta.com) 05/07/2012
> + *
> + *	This program is free software; you can redistribute it and/or
> + *	modify it under the terms of the GNU General Public License
> + *	as published by the Free Software Foundation; either version
> + *	2 of the License, or (at your option) any later version.
> + *
> + */
> +
> +/*
> +   This version of net/ipv4/ip_vti.c is cloned of net/ipv4/ipip.c
> +
> +   For comments look at net/ipv4/ip_gre.c --ANK
> + */
> +
> +
> +#include <linux/capability.h>
> +#include <linux/module.h>
> +#include <linux/types.h>
> +#include <linux/kernel.h>
> +#include <linux/uaccess.h>
> +#include <linux/skbuff.h>
> +#include <linux/netdevice.h>
> +#include <linux/in.h>
> +#include <linux/tcp.h>
> +#include <linux/udp.h>
> +#include <linux/if_arp.h>
> +#include <linux/mroute.h>
> +#include <linux/init.h>
> +#include <linux/netfilter_ipv4.h>
> +#include <linux/if_ether.h>
> +
> +#include <net/sock.h>
> +#include <net/ip.h>
> +#include <net/icmp.h>
> +#include <net/ipip.h>
> +#include <net/inet_ecn.h>
> +#include <net/xfrm.h>
> +#include <net/net_namespace.h>
> +#include <net/netns/generic.h>
> +
> +#define HASH_SIZE  16
> +#define HASH(addr) (((__force u32)addr^((__force u32)addr>>4))&0xF)
> +
> +static struct rtnl_link_ops vti_link_ops __read_mostly;
> +
> +static int vti_net_id __read_mostly;
> +struct vti_net {
> +	struct ip_tunnel __rcu *tunnels_r_l[HASH_SIZE];
> +	struct ip_tunnel __rcu *tunnels_r[HASH_SIZE];
> +	struct ip_tunnel __rcu *tunnels_l[HASH_SIZE];
> +	struct ip_tunnel __rcu *tunnels_wc[1];
> +	struct ip_tunnel **tunnels[4];
> +
> +	struct net_device *fb_tunnel_dev;
> +};
> +
> +static int vti_fb_tunnel_init(struct net_device *dev);
> +static int vti_tunnel_init(struct net_device *dev);
> +static void vti_tunnel_setup(struct net_device *dev);
> +static void vti_dev_free(struct net_device *dev);
> +static int vti_tunnel_bind_dev(struct net_device *dev);
> +
> +/*
> + * Locking : hash tables are protected by RCU and RTNL
> + */
> +
> +#define for_each_ip_tunnel_rcu(start) \
> +	for (t = rcu_dereference(start); t; t = rcu_dereference(t->next))
> +
> +/* often modified stats are per cpu, other are shared (netdev->stats) */
> +struct pcpu_tstats {
> +	u64	rx_packets;
> +	u64	rx_bytes;
> +	u64	tx_packets;
> +	u64	tx_bytes;
> +	struct	u64_stats_sync	syncp;
> +};
> +
> +#define VTI_XMIT(stats1, stats2) do {				\
> +	int err;						\
> +	int pkt_len = skb->len;					\
> +	err = dst_output(skb);					\
> +	if (net_xmit_eval(err) == 0) {				\
> +		(stats1)->tx_bytes += pkt_len;			\
> +		(stats1)->tx_packets++;				\
> +	} else {						\
> +		(stats2)->tx_errors++;				\
> +		(stats2)->tx_aborted_errors++;			\
> +	}							\
> +} while (0)
> +
> +
> +static struct rtnl_link_stats64 *vti_get_stats64(struct net_device *dev,
> +					       struct rtnl_link_stats64 *tot)
> +{
> +	int i;
> +
> +	for_each_possible_cpu(i) {
> +		const struct pcpu_tstats *tstats = per_cpu_ptr(dev->tstats, i);
> +		u64 rx_packets, rx_bytes, tx_packets, tx_bytes;
> +		unsigned int start;
> +
> +		do {
> +			start = u64_stats_fetch_begin_bh(&tstats->syncp);
> +			rx_packets = tstats->rx_packets;
> +			tx_packets = tstats->tx_packets;
> +			rx_bytes = tstats->rx_bytes;
> +			tx_bytes = tstats->tx_bytes;
> +		} while (u64_stats_fetch_retry_bh(&tstats->syncp, start));
> +
> +		tot->rx_packets += rx_packets;
> +		tot->tx_packets += tx_packets;
> +		tot->rx_bytes   += rx_bytes;
> +		tot->tx_bytes   += tx_bytes;
> +	}
> +
> +	tot->multicast = dev->stats.multicast;
> +	tot->rx_crc_errors = dev->stats.rx_crc_errors;
> +	tot->rx_fifo_errors = dev->stats.rx_fifo_errors;
> +	tot->rx_length_errors = dev->stats.rx_length_errors;
> +	tot->rx_errors = dev->stats.rx_errors;
> +	tot->tx_fifo_errors = dev->stats.tx_fifo_errors;
> +	tot->tx_carrier_errors = dev->stats.tx_carrier_errors;
> +	tot->tx_dropped = dev->stats.tx_dropped;
> +	tot->tx_aborted_errors = dev->stats.tx_aborted_errors;
> +	tot->tx_errors = dev->stats.tx_errors;
> +
> +	return tot;
> +}
> +
> +static struct ip_tunnel *vti_tunnel_lookup(struct net *net,
> +					 __be32 remote, __be32 local)
> +{
> +	unsigned h0 = HASH(remote);
> +	unsigned h1 = HASH(local);
> +	struct ip_tunnel *t;
> +	struct vti_net *ipn = net_generic(net, vti_net_id);
> +
> +	for_each_ip_tunnel_rcu(ipn->tunnels_r_l[h0 ^ h1])
> +		if (local == t->parms.iph.saddr &&
> +		    remote == t->parms.iph.daddr && (t->dev->flags&IFF_UP))
> +			return t;
> +	for_each_ip_tunnel_rcu(ipn->tunnels_r[h0])
> +		if (remote == t->parms.iph.daddr && (t->dev->flags&IFF_UP))
> +			return t;
> +
> +	for_each_ip_tunnel_rcu(ipn->tunnels_l[h1])
> +		if (local == t->parms.iph.saddr && (t->dev->flags&IFF_UP))
> +			return t;
> +
> +	for_each_ip_tunnel_rcu(ipn->tunnels_wc[0])
> +		if (t && (t->dev->flags&IFF_UP))
> +			return t;
> +	return NULL;
> +}
> +
> +static struct ip_tunnel **__vti_bucket(struct vti_net *ipn,
> +				     struct ip_tunnel_parm *parms)
> +{
> +	__be32 remote = parms->iph.daddr;
> +	__be32 local = parms->iph.saddr;
> +	unsigned h = 0;
> +	int prio = 0;
> +
> +	if (remote) {
> +		prio |= 2;
> +		h ^= HASH(remote);
> +	}
> +	if (local) {
> +		prio |= 1;
> +		h ^= HASH(local);
> +	}
> +	return &ipn->tunnels[prio][h];
> +}
> +
> +static inline struct ip_tunnel **vti_bucket(struct vti_net *ipn,
> +					  struct ip_tunnel *t)
> +{
> +	return __vti_bucket(ipn, &t->parms);
> +}
> +
> +static void vti_tunnel_unlink(struct vti_net *ipn, struct ip_tunnel *t)
> +{
> +	struct ip_tunnel __rcu **tp;
> +	struct ip_tunnel *iter;
> +
> +	for (tp = vti_bucket(ipn, t);
> +	     (iter = rtnl_dereference(*tp)) != NULL;
> +	     tp = &iter->next) {
> +		if (t == iter) {
> +			rcu_assign_pointer(*tp, t->next);
> +			break;
> +		}
> +	}
> +}
> +
> +static void vti_tunnel_link(struct vti_net *ipn, struct ip_tunnel *t)
> +{
> +	struct ip_tunnel __rcu **tp = vti_bucket(ipn, t);
> +
> +	rcu_assign_pointer(t->next, rtnl_dereference(*tp));
> +	rcu_assign_pointer(*tp, t);
> +}
> +
> +static struct ip_tunnel *vti_tunnel_locate(struct net *net,
> +					 struct ip_tunnel_parm *parms,
> +					 int create)
> +{
> +	__be32 remote = parms->iph.daddr;
> +	__be32 local = parms->iph.saddr;
> +	struct ip_tunnel *t, *nt;
> +	struct ip_tunnel __rcu **tp;
> +	struct net_device *dev;
> +	char name[IFNAMSIZ];
> +	struct vti_net *ipn = net_generic(net, vti_net_id);
> +
> +	for (tp = __vti_bucket(ipn, parms);
> +	     (t = rtnl_dereference(*tp)) != NULL;
> +	     tp = &t->next) {
> +		if (local == t->parms.iph.saddr && remote == t->parms.iph.daddr)
> +			return t;
> +	}
> +	if (!create)
> +		return NULL;
> +
> +	if (parms->name[0])
> +		strlcpy(name, parms->name, IFNAMSIZ);
> +	else
> +		strcpy(name, "vti%d");
> +
> +	dev = alloc_netdev(sizeof(*t), name, vti_tunnel_setup);
> +	if (dev == NULL)
> +		return NULL;
> +
> +	dev_net_set(dev, net);
> +
> +	nt = netdev_priv(dev);
> +	nt->parms = *parms;
> +	dev->rtnl_link_ops = &vti_link_ops;
> +
> +	vti_tunnel_bind_dev(dev);
> +
> +	if (register_netdevice(dev) < 0)
> +		goto failed_free;
> +
> +	dev_hold(dev);
> +	vti_tunnel_link(ipn, nt);
> +	return nt;
> +
> + failed_free:
> +	free_netdev(dev);
> +	return NULL;
> +}
> +
> +static void vti_tunnel_uninit(struct net_device *dev)
> +{
> +	struct net *net = dev_net(dev);
> +	struct vti_net *ipn = net_generic(net, vti_net_id);
> +
> +	if (dev == ipn->fb_tunnel_dev)
> +		RCU_INIT_POINTER(ipn->tunnels_wc[0], NULL);
> +	else
> +		vti_tunnel_unlink(ipn, netdev_priv(dev));
> +	dev_put(dev);
> +}
> +
> +static int vti_err(struct sk_buff *skb, u32 info)
> +{
> +
> +	/* All the routers (except for Linux) return only
> +	 * 8 bytes of packet payload. It means, that precise relaying of
> +	 * ICMP in the real Internet is absolutely infeasible.
> +	 */
> +	struct iphdr *iph = (struct iphdr *)skb->data;
> +	const int type = icmp_hdr(skb)->type;
> +	const int code = icmp_hdr(skb)->code;
> +	struct ip_tunnel *t;
> +	int err;
> +
> +	switch (type) {
> +	default:
> +	case ICMP_PARAMETERPROB:
> +		return 0;
> +
> +	case ICMP_DEST_UNREACH:
> +		switch (code) {
> +		case ICMP_SR_FAILED:
> +		case ICMP_PORT_UNREACH:
> +			/* Impossible event. */
> +			return 0;
> +		case ICMP_FRAG_NEEDED:
> +			/* Soft state for pmtu is maintained by IP core. */
> +			return 0;
> +		default:
> +			/* All others are translated to HOST_UNREACH. */
> +			break;
> +		}
> +		break;
> +	case ICMP_TIME_EXCEEDED:
> +		if (code != ICMP_EXC_TTL)
> +			return 0;
> +		break;
> +	}
> +
> +	err = -ENOENT;
> +
> +	rcu_read_lock();
> +	t = vti_tunnel_lookup(dev_net(skb->dev), iph->daddr, iph->saddr);
> +	if (t == NULL || t->parms.iph.daddr == 0)
> +		goto out;
> +
> +	err = 0;
> +	if (t->parms.iph.ttl == 0 && type == ICMP_TIME_EXCEEDED)
> +		goto out;
> +
> +	if (time_before(jiffies, t->err_time + IPTUNNEL_ERR_TIMEO))
> +		t->err_count++;
> +	else
> +		t->err_count = 1;
> +	t->err_time = jiffies;
> +out:
> +	rcu_read_unlock();
> +	return err;
> +}
> +
> +static inline void vti_ecn_decapsulate(const struct iphdr *outer_iph,
> +					struct sk_buff *skb)
> +{
> +	struct iphdr *inner_iph = ip_hdr(skb);
> +
> +	if (INET_ECN_is_ce(outer_iph->tos))
> +		IP_ECN_set_ce(inner_iph);
> +}

vti_ecn_decapsulate is unused.

> +
> +/*
> + * We dont digest the packet therefore let the packet pass.
> + */
> +static int vti_rcv(struct sk_buff *skb)
> +{
> +	struct ip_tunnel *tunnel;
> +	const struct iphdr *iph = ip_hdr(skb);
> +
> +	rcu_read_lock();
> +	tunnel = vti_tunnel_lookup(dev_net(skb->dev), iph->saddr, iph->daddr);
> +	if (tunnel != NULL) {
> +		struct pcpu_tstats *tstats;
> +
> +		tstats = this_cpu_ptr(tunnel->dev->tstats);
> +		tstats->rx_packets++;
> +		tstats->rx_bytes += skb->len;
> +
> +		skb->dev = tunnel->dev;
> +		skb_dst_drop(skb);

Really need to drop a refcount here? xfrm_input() does that too before
it reinjects the packet into layer 2.

> +		nf_reset(skb);

Same here.

> +		rcu_read_unlock();
> +		/* We do not eat the packet here therefore return 1 */
> +		return 1;
> +	}
> +	rcu_read_unlock();
> +
> +	return -1;
> +}
> +
> +/*
> + *	This function assumes it is being called from dev_queue_xmit()
> + *	and that skb is filled properly by that function.
> + */
> +
> +static netdev_tx_t vti_tunnel_xmit(struct sk_buff *skb, struct net_device *dev)
> +{
> +	struct ip_tunnel *tunnel = netdev_priv(dev);
> +	struct pcpu_tstats *tstats;
> +	struct net_device_stats *stats = &tunnel->dev->stats;
> +	struct iphdr  *tiph = &tunnel->parms.iph;
> +	u8     tos = tunnel->parms.iph.tos;
> +	struct rtable *rt;		/* Route to the other host */
> +	struct net_device *tdev;	/* Device to other host */
> +	struct iphdr  *old_iph = ip_hdr(skb);
> +	__be32 dst = tiph->daddr;
> +	struct flowi4 fl4;
> +
> +	if (skb->protocol != htons(ETH_P_IP))
> +		goto tx_error;
> +
> +	if (tos&1)
> +		tos = old_iph->tos;
> +
> +	if (!dst) {
> +		/* NBMA tunnel */
> +		rt = skb_rtable(skb);
> +		if (rt == NULL) {
> +			stats->tx_fifo_errors++;
> +			goto tx_error;
> +		}
> +		dst = rt->rt_gateway;
> +		if (dst == 0)
> +			goto tx_error_icmp;
> +	}
> +
> +	memset(&fl4, 0, sizeof(fl4));
> +	flowi4_init_output(&fl4, tunnel->parms.link,
> +		htonl(tunnel->parms.i_key), RT_TOS(tos), RT_SCOPE_UNIVERSE,
> +		IPPROTO_IPIP, 0,
> +		dst, tiph->saddr, 0, 0);
> +	rt = ip_route_output_key(dev_net(dev), &fl4);
> +	if (IS_ERR(rt)) {
> +		dev->stats.tx_carrier_errors++;
> +		goto tx_error_icmp;
> +	}
> +#ifdef CONFIG_XFRM
> +		/* if there is no transform then this tunnel is not functional. */
> +		if (!rt->dst.xfrm) {
> +			stats->tx_carrier_errors++;
> +			goto tx_error_icmp;
> +		}
> +#endif
> +	tdev = rt->dst.dev;
> +
> +	if (tdev == dev) {
> +		ip_rt_put(rt);
> +		stats->collisions++;
> +		goto tx_error;
> +
> +	}
> +
> +
> +	if (tunnel->err_count > 0) {
> +		if (time_before(jiffies,
> +				tunnel->err_time + IPTUNNEL_ERR_TIMEO)) {
> +			tunnel->err_count--;
> +			dst_link_failure(skb);
> +		} else
> +			tunnel->err_count = 0;
> +	}
> +
> +
> +	IPCB(skb)->flags &= ~(IPSKB_XFRM_TUNNEL_SIZE | IPSKB_XFRM_TRANSFORMED |
> +			      IPSKB_REROUTED);
> +	skb_dst_drop(skb);
> +	skb_dst_set(skb, &rt->dst);
> +	nf_reset(skb);
> +	skb->dev = skb_dst(skb)->dev;
> +
> +	tstats = this_cpu_ptr(dev->tstats);
> +	VTI_XMIT(tstats, &dev->stats);
> +	return NETDEV_TX_OK;
> +
> +tx_error_icmp:
> +	dst_link_failure(skb);
> +tx_error:
> +	stats->tx_errors++;
> +	dev_kfree_skb(skb);
> +	return NETDEV_TX_OK;
> +}
> +
> +static int vti_tunnel_bind_dev(struct net_device *dev)
> +{
> +	struct net_device *tdev = NULL;
> +	struct ip_tunnel *tunnel;
> +	struct iphdr *iph;
> +
> +	tunnel = netdev_priv(dev);
> +	iph = &tunnel->parms.iph;
> +
> +	if (iph->daddr) {
> +		struct rtable *rt;
> +		struct flowi4 fl4;
> +		memset(&fl4, 0, sizeof(fl4));
> +		flowi4_init_output(&fl4, tunnel->parms.link,
> +				htonl(tunnel->parms.i_key), RT_TOS(iph->tos), RT_SCOPE_UNIVERSE,
> +				IPPROTO_IPIP, 0,
> +				iph->daddr, iph->saddr, 0, 0);
> +		rt = ip_route_output_key(dev_net(dev), &fl4);
> +		if (!IS_ERR(rt)) {
> +			tdev = rt->dst.dev;
> +			ip_rt_put(rt);
> +		}
> +		dev->flags |= IFF_POINTOPOINT;
> +	}
> +
> +	if (!tdev && tunnel->parms.link)
> +		tdev = __dev_get_by_index(dev_net(dev), tunnel->parms.link);
> +
> +	if (tdev) {
> +		dev->hard_header_len = tdev->hard_header_len + sizeof(struct iphdr);
> +		dev->mtu = tdev->mtu;
> +	}
> +	dev->iflink = tunnel->parms.link;
> +	return dev->mtu;
> +}
> +
> +static int
> +vti_tunnel_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
> +{
> +	int err = 0;
> +	struct ip_tunnel_parm p;
> +	struct ip_tunnel *t;
> +	struct net *net = dev_net(dev);
> +	struct vti_net *ipn = net_generic(net, vti_net_id);
> +
> +	switch (cmd) {
> +	case SIOCGETTUNNEL:
> +		t = NULL;
> +		if (dev == ipn->fb_tunnel_dev) {
> +			if (copy_from_user(&p, ifr->ifr_ifru.ifru_data, sizeof(p))) {
> +				err = -EFAULT;
> +				break;
> +			}
> +			t = vti_tunnel_locate(net, &p, 0);
> +		}
> +		if (t == NULL)
> +			t = netdev_priv(dev);
> +		memcpy(&p, &t->parms, sizeof(p));
> +		p.i_flags |= GRE_KEY;
> +		p.o_flags |= GRE_KEY;
> +		if (copy_to_user(ifr->ifr_ifru.ifru_data, &p, sizeof(p)))
> +			err = -EFAULT;
> +		break;
> +
> +	case SIOCADDTUNNEL:
> +	case SIOCCHGTUNNEL:
> +		err = -EPERM;
> +		if (!capable(CAP_NET_ADMIN))
> +			goto done;
> +
> +		err = -EFAULT;
> +		if (copy_from_user(&p, ifr->ifr_ifru.ifru_data, sizeof(p)))
> +			goto done;
> +
> +		err = -EINVAL;
> +		if (p.iph.version != 4 || p.iph.protocol != IPPROTO_ESP ||
> +		    p.iph.ihl != 5 || (p.iph.frag_off&htons(~IP_DF)))
> +			goto done;
> +		if (p.iph.ttl)
> +			p.iph.frag_off |= htons(IP_DF);
> +
> +		t = vti_tunnel_locate(net, &p, cmd == SIOCADDTUNNEL);
> +
> +		if (dev != ipn->fb_tunnel_dev && cmd == SIOCCHGTUNNEL) {
> +			if (t != NULL) {
> +				if (t->dev != dev) {
> +					err = -EEXIST;
> +					break;
> +				}
> +			} else {
> +				if (((dev->flags&IFF_POINTOPOINT) && !p.iph.daddr) ||
> +				    (!(dev->flags&IFF_POINTOPOINT) && p.iph.daddr)) {
> +					err = -EINVAL;
> +					break;
> +				}
> +				t = netdev_priv(dev);
> +				vti_tunnel_unlink(ipn, t);
> +				synchronize_net();
> +				t->parms.iph.saddr = p.iph.saddr;
> +				t->parms.iph.daddr = p.iph.daddr;
> +				t->parms.i_key = p.i_key;
> +				t->parms.o_key = p.o_key;
> +				t->parms.iph.protocol = IPPROTO_ESP;
> +				memcpy(dev->dev_addr, &p.iph.saddr, 4);
> +				memcpy(dev->broadcast, &p.iph.daddr, 4);
> +				vti_tunnel_link(ipn, t);
> +				netdev_state_change(dev);
> +			}
> +		}
> +
> +		if (t) {
> +			err = 0;
> +			if (cmd == SIOCCHGTUNNEL) {
> +				t->parms.iph.ttl = p.iph.ttl;
> +				t->parms.iph.tos = p.iph.tos;
> +				t->parms.iph.frag_off = p.iph.frag_off;
> +				t->parms.i_key = p.i_key;
> +				t->parms.o_key = p.o_key;
> +				if (t->parms.link != p.link) {
> +					t->parms.link = p.link;
> +					vti_tunnel_bind_dev(dev);
> +					netdev_state_change(dev);
> +				}
> +			}
> +			if (copy_to_user(ifr->ifr_ifru.ifru_data, &t->parms, sizeof(p)))
> +				err = -EFAULT;
> +		} else
> +			err = (cmd == SIOCADDTUNNEL ? -ENOBUFS : -ENOENT);
> +		break;
> +
> +	case SIOCDELTUNNEL:
> +		err = -EPERM;
> +		if (!capable(CAP_NET_ADMIN))
> +			goto done;
> +
> +		if (dev == ipn->fb_tunnel_dev) {
> +			err = -EFAULT;
> +			if (copy_from_user(&p, ifr->ifr_ifru.ifru_data, sizeof(p)))
> +				goto done;
> +			err = -ENOENT;
> +
> +			t = vti_tunnel_locate(net, &p, 0);
> +			if (t == NULL)
> +				goto done;
> +			err = -EPERM;
> +			if (t->dev == ipn->fb_tunnel_dev)
> +				goto done;
> +			dev = t->dev;
> +		}
> +		unregister_netdevice(dev);
> +		err = 0;
> +		break;
> +
> +	default:
> +		err = -EINVAL;
> +	}
> +
> +done:
> +	return err;
> +}
> +
> +static int vti_tunnel_change_mtu(struct net_device *dev, int new_mtu)
> +{
> +	if (new_mtu < 68 || new_mtu > 0xFFF8)
> +		return -EINVAL;
> +	dev->mtu = new_mtu;
> +	return 0;
> +}
> +
> +static const struct net_device_ops vti_netdev_ops = {
> +	.ndo_init	= vti_tunnel_init,
> +	.ndo_uninit	= vti_tunnel_uninit,
> +	.ndo_start_xmit	= vti_tunnel_xmit,
> +	.ndo_do_ioctl	= vti_tunnel_ioctl,
> +	.ndo_change_mtu	= vti_tunnel_change_mtu,
> +	.ndo_get_stats64  = vti_get_stats64,
> +};
> +
> +static void vti_dev_free(struct net_device *dev)
> +{
> +	free_percpu(dev->tstats);
> +	free_netdev(dev);
> +}
> +
> +static void vti_tunnel_setup(struct net_device *dev)
> +{
> +	dev->netdev_ops		= &vti_netdev_ops;
> +	dev->destructor		= vti_dev_free;
> +
> +	dev->type		= ARPHRD_TUNNEL;
> +	dev->hard_header_len	= LL_MAX_HEADER + sizeof(struct iphdr);
> +	dev->mtu		= ETH_DATA_LEN;
> +	dev->flags		= IFF_NOARP;
> +	dev->iflink		= 0;
> +	dev->addr_len		= 4;
> +	dev->features		|= NETIF_F_NETNS_LOCAL;
> +	dev->features		|= NETIF_F_LLTX;
> +	dev->priv_flags		&= ~IFF_XMIT_DST_RELEASE;
> +}
> +
> +static int vti_tunnel_init(struct net_device *dev)
> +{
> +	struct ip_tunnel *tunnel = netdev_priv(dev);
> +
> +	tunnel->dev = dev;
> +	strcpy(tunnel->parms.name, dev->name);
> +
> +	memcpy(dev->dev_addr, &tunnel->parms.iph.saddr, 4);
> +	memcpy(dev->broadcast, &tunnel->parms.iph.daddr, 4);
> +
> +	dev->tstats = alloc_percpu(struct pcpu_tstats);
> +	if (!dev->tstats)
> +		return -ENOMEM;
> +
> +	return 0;
> +}
> +
> +static int __net_init vti_fb_tunnel_init(struct net_device *dev)
> +{
> +	struct ip_tunnel *tunnel = netdev_priv(dev);
> +	struct iphdr *iph = &tunnel->parms.iph;
> +	struct vti_net *ipn = net_generic(dev_net(dev), vti_net_id);
> +
> +	tunnel->dev = dev;
> +	strcpy(tunnel->parms.name, dev->name);
> +
> +	iph->version		= 4;
> +	iph->protocol		= IPPROTO_ESP;

Why IPPROTO_ESP? What's with the other IPsec protocols?
Shouldn't this be IPPROTO_IPIP?

> +	iph->ihl		= 5;
> +
> +	dev->tstats = alloc_percpu(struct pcpu_tstats);
> +	if (!dev->tstats)
> +		return -ENOMEM;
> +
> +	dev_hold(dev);
> +	rcu_assign_pointer(ipn->tunnels_wc[0], tunnel);
> +	return 0;
> +}
> +
> +static struct xfrm_tunnel vti_handler __read_mostly = {
> +	.handler	=	vti_rcv,
> +	.err_handler	=	vti_err,
> +	.priority	=	1,
> +};
> +
> +static void vti_destroy_tunnels(struct vti_net *ipn, struct list_head *head)
> +{
> +	int prio;
> +
> +	for (prio = 1; prio < 4; prio++) {
> +		int h;
> +		for (h = 0; h < HASH_SIZE; h++) {
> +			struct ip_tunnel *t;
> +
> +			t = rtnl_dereference(ipn->tunnels[prio][h]);
> +			while (t != NULL) {
> +				unregister_netdevice_queue(t->dev, head);
> +				t = rtnl_dereference(t->next);
> +			}
> +		}
> +	}
> +}
> +
> +static int __net_init vti_init_net(struct net *net)
> +{
> +	int err;
> +	struct vti_net *ipn = net_generic(net, vti_net_id);
> +
> +	ipn->tunnels[0] = ipn->tunnels_wc;
> +	ipn->tunnels[1] = ipn->tunnels_l;
> +	ipn->tunnels[2] = ipn->tunnels_r;
> +	ipn->tunnels[3] = ipn->tunnels_r_l;
> +
> +	ipn->fb_tunnel_dev = alloc_netdev(sizeof(struct ip_tunnel),
> +					   "ip_vti0",
> +					   vti_tunnel_setup);
> +	if (!ipn->fb_tunnel_dev) {
> +		err = -ENOMEM;
> +		goto err_alloc_dev;
> +	}
> +	dev_net_set(ipn->fb_tunnel_dev, net);
> +
> +	err = vti_fb_tunnel_init(ipn->fb_tunnel_dev);
> +	if (err)
> +		goto err_reg_dev;
> +	ipn->fb_tunnel_dev->rtnl_link_ops = &vti_link_ops;
> +
> +	err = register_netdev(ipn->fb_tunnel_dev);
> +	if (err)
> +		goto err_reg_dev;
> +	return 0;
> +
> +err_reg_dev:
> +	vti_dev_free(ipn->fb_tunnel_dev);
> +err_alloc_dev:
> +	/* nothing */
> +	return err;
> +}
> +
> +static void __net_exit vti_exit_net(struct net *net)
> +{
> +	struct vti_net *ipn = net_generic(net, vti_net_id);
> +	LIST_HEAD(list);
> +
> +	rtnl_lock();
> +	vti_destroy_tunnels(ipn, &list);
> +	unregister_netdevice_many(&list);
> +	rtnl_unlock();
> +}
> +
> +static struct pernet_operations vti_net_ops = {
> +	.init = vti_init_net,
> +	.exit = vti_exit_net,
> +	.id   = &vti_net_id,
> +	.size = sizeof(struct vti_net),
> +};
> +
> +static int vti_tunnel_validate(struct nlattr *tb[], struct nlattr *data[])
> +{
> +	return 0;
> +}
> +
> +static void vti_netlink_parms(struct nlattr *data[],
> +				struct ip_tunnel_parm *parms)
> +{
> +	memset(parms, 0, sizeof(*parms));
> +
> +	parms->iph.protocol = IPPROTO_ESP;
> +
> +	if (!data)
> +		return;
> +
> +	if (data[IFLA_VTI_LINK])
> +		parms->link = nla_get_u32(data[IFLA_VTI_LINK]);
> +
> +	if (data[IFLA_VTI_IKEY])
> +		parms->i_key = nla_get_be32(data[IFLA_VTI_IKEY]);
> +
> +	if (data[IFLA_VTI_OKEY])
> +		parms->o_key = nla_get_be32(data[IFLA_VTI_OKEY]);
> +
> +	if (data[IFLA_VTI_LOCAL])
> +		parms->iph.saddr = nla_get_be32(data[IFLA_VTI_LOCAL]);
> +
> +	if (data[IFLA_VTI_REMOTE])
> +		parms->iph.daddr = nla_get_be32(data[IFLA_VTI_REMOTE]);
> +
> +}
> +
> +static int vti_newlink(struct net *src_net, struct net_device *dev, struct nlattr *tb[],
> +			 struct nlattr *data[])
> +{
> +	struct ip_tunnel *nt;
> +	struct net *net = dev_net(dev);
> +	struct vti_net *ipn = net_generic(net, vti_net_id);
> +	int mtu;
> +	int err;
> +
> +	nt = netdev_priv(dev);
> +	vti_netlink_parms(data, &nt->parms);
> +
> +	if (vti_tunnel_locate(net, &nt->parms, 0))
> +		return -EEXIST;
> +
> +	mtu = vti_tunnel_bind_dev(dev);
> +	if (!tb[IFLA_MTU])
> +		dev->mtu = mtu;
> +
> +	err = register_netdevice(dev);
> +	if (err)
> +		goto out;
> +
> +	dev_hold(dev);
> +	vti_tunnel_link(ipn, nt);
> +
> +out:
> +	return err;
> +	return 0;
> +}
> +
> +static int vti_changelink(struct net_device *dev, struct nlattr *tb[],
> +			    struct nlattr *data[])
> +{
> +	struct ip_tunnel *t, *nt;
> +	struct net *net = dev_net(dev);
> +	struct vti_net *ipn = net_generic(net, vti_net_id);
> +	struct ip_tunnel_parm p;
> +	int mtu;
> +
> +	if (dev == ipn->fb_tunnel_dev)
> +		return -EINVAL;
> +
> +	nt = netdev_priv(dev);
> +	vti_netlink_parms(data, &p);
> +
> +	t = vti_tunnel_locate(net, &p, 0);
> +
> +	if (t) {
> +		if (t->dev != dev)
> +			return -EEXIST;
> +	} else {
> +		t = nt;
> +
> +		vti_tunnel_unlink(ipn, t);
> +		t->parms.iph.saddr = p.iph.saddr;
> +		t->parms.iph.daddr = p.iph.daddr;
> +		t->parms.i_key = p.i_key;
> +		t->parms.o_key = p.o_key;
> +		if (dev->type != ARPHRD_ETHER) {
> +			memcpy(dev->dev_addr, &p.iph.saddr, 4);
> +			memcpy(dev->broadcast, &p.iph.daddr, 4);
> +		}
> +		vti_tunnel_link(ipn, t);
> +		netdev_state_change(dev);
> +	}
> +
> +	if (t->parms.link != p.link) {
> +		t->parms.link = p.link;
> +		mtu = vti_tunnel_bind_dev(dev);
> +		if (!tb[IFLA_MTU])
> +			dev->mtu = mtu;
> +		netdev_state_change(dev);
> +	}
> +
> +	return 0;
> +}
> +
> +static size_t vti_get_size(const struct net_device *dev)
> +{
> +	return
> +		/* IFLA_VTI_LINK */
> +		nla_total_size(4) +
> +		/* IFLA_VTI_IKEY */
> +		nla_total_size(4) +
> +		/* IFLA_VTI_OKEY */
> +		nla_total_size(4) +
> +		/* IFLA_VTI_LOCAL */
> +		nla_total_size(4) +
> +		/* IFLA_VTI_REMOTE */
> +		nla_total_size(4) +
> +		0;
> +}
> +
> +static int vti_fill_info(struct sk_buff *skb, const struct net_device *dev)
> +{
> +	struct ip_tunnel *t = netdev_priv(dev);
> +	struct ip_tunnel_parm *p = &t->parms;
> +
> +	nla_put_u32(skb, IFLA_VTI_LINK, p->link);
> +	nla_put_be32(skb, IFLA_VTI_IKEY, p->i_key);
> +	nla_put_be32(skb, IFLA_VTI_OKEY, p->o_key);
> +	nla_put_be32(skb, IFLA_VTI_LOCAL, p->iph.saddr);
> +	nla_put_be32(skb, IFLA_VTI_REMOTE, p->iph.daddr);
> +
> +	return 0;
> +}
> +
> +static const struct nla_policy vti_policy[IFLA_VTI_MAX + 1] = {
> +	[IFLA_VTI_LINK]		= { .type = NLA_U32 },
> +	[IFLA_VTI_IKEY]		= { .type = NLA_U32 },
> +	[IFLA_VTI_OKEY]		= { .type = NLA_U32 },
> +	[IFLA_VTI_LOCAL]	= { .len = FIELD_SIZEOF(struct iphdr, saddr) },
> +	[IFLA_VTI_REMOTE]	= { .len = FIELD_SIZEOF(struct iphdr, daddr) },
> +};
> +
> +static struct rtnl_link_ops vti_link_ops __read_mostly = {
> +	.kind		= "vti",
> +	.maxtype	= IFLA_VTI_MAX,
> +	.policy		= vti_policy,
> +	.priv_size	= sizeof(struct ip_tunnel),
> +	.setup		= vti_tunnel_setup,
> +	.validate	= vti_tunnel_validate,
> +	.newlink	= vti_newlink,
> +	.changelink	= vti_changelink,
> +	.get_size	= vti_get_size,
> +	.fill_info	= vti_fill_info,
> +};
> +
> +static int __init vti_init(void)
> +{
> +	int err;
> +
> +	pr_info("IPv4 over ESP tunneling driver v4\n");
> +
> +	err = register_pernet_device(&vti_net_ops);
> +	if (err < 0)
> +		return err;
> +	err = xfrm4_mode_tunnel_input_register(&vti_handler);
> +	if (err < 0) {
> +		unregister_pernet_device(&vti_net_ops);
> +		pr_info(KERN_INFO "vti init: can't register tunnel\n");
> +	}
> +
> +	err = rtnl_link_register(&vti_link_ops);
> +	if (err < 0)
> +		goto rtnl_link_failed;
> +
> +	return err;
> +
> +rtnl_link_failed:
> +	xfrm4_mode_tunnel_input_deregister(&vti_handler);
> +	unregister_pernet_device(&vti_net_ops);
> +	return err;
> +}
> +
> +static void __exit vti_fini(void)
> +{
> +	rtnl_link_unregister(&vti_link_ops);
> +	if (xfrm4_mode_tunnel_input_deregister(&vti_handler))
> +		pr_info("vti close: can't deregister tunnel\n");
> +
> +	unregister_pernet_device(&vti_net_ops);
> +}
> +
> +module_init(vti_init);
> +module_exit(vti_fini);
> +MODULE_LICENSE("GPL");
> +MODULE_ALIAS_RTNL_LINK("vti");
> +MODULE_ALIAS_NETDEV("ip_vti0");
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: Regression on TX throughput when using bonding
From: Eric Dumazet @ 2012-06-14  9:21 UTC (permalink / raw)
  To: Jean-Michel Hautbois; +Cc: netdev
In-Reply-To: <CAL8zT=j8pEVc7AHwJzk+4skj=1tfRYZ_C7CuXD_08T_VmkwoFg@mail.gmail.com>

On Thu, 2012-06-14 at 10:58 +0200, Jean-Michel Hautbois wrote:
> Hi all,
> 
> I have bisected a regression which concerns TX throughput when using bonding.
> I tested only with 10Gbps cards, as it appears when bandwidth need is
> over 1Gbps on my machine.
> I send UDP multicast packets over bonding and observe the tc result.
> 
> When KO :
> $>tc -s -d qdisc show dev eth1
> qdisc pfifo_fast 0: root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1
> 1 1 1 1 1 1
>  Sent 1106527591 bytes 273802 pkt (dropped 306419, overlimits 0 requeues 223)
>  backlog 0b 0p requeues 223
> 
> Ok course, when OK, dropped is 0.
> $>tc -s -d qdisc show dev eth1
> qdisc pfifo_fast 0: root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1
> 1 1 1 1 1 1
>  Sent 1648662087 bytes 408009 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> 
> 
> Here is the incriminated commit:
> 
> fc6055a5ba31e2c14e36e8939f9bf2b6d586a7f5 is the first bad commit
> commit fc6055a5ba31e2c14e36e8939f9bf2b6d586a7f5

>     net: Introduce skb_orphan_try()

So you are saying that if you make skb_orphan_try() doing nothing, it
solves your problem ?

diff --git a/net/core/dev.c b/net/core/dev.c
index cd09819..6df40dd 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2096,6 +2096,7 @@ static int dev_gso_segment(struct sk_buff *skb, netdev_features_t features)
  */
 static inline void skb_orphan_try(struct sk_buff *skb)
 {
+#if 0
 	struct sock *sk = skb->sk;
 
 	if (sk && !skb_shinfo(skb)->tx_flags) {
@@ -2106,6 +2107,7 @@ static inline void skb_orphan_try(struct sk_buff *skb)
 			skb->rxhash = sk->sk_hash;
 		skb_orphan(skb);
 	}
+#endif
 }
 
 static bool can_checksum_protocol(netdev_features_t features, __be16 protocol)

^ permalink raw reply related

* Re: [net-next PATCH 01/02] net/ipv4: VTI support rx-path hook in xfrm4_mode_tunnel.
From: Steffen Klassert @ 2012-06-14  9:25 UTC (permalink / raw)
  To: Saurabh; +Cc: netdev
In-Reply-To: <20120608173246.GA11945@debian-saurabh-64.vyatta.com>

On Fri, Jun 08, 2012 at 10:32:46AM -0700, Saurabh wrote:
> 
> 
> Add hook for rx-path xfmr4_mode_tunnel for VTI tunnel module.
> 
> Signed-off-by: Saurabh Mohan <saurabh.mohan@vyatta.com>
> Reviewed-by: Stephen Hemminger <shemminger@vyatta.com>
> 
> ---
> diff --git a/include/net/xfrm.h b/include/net/xfrm.h
> index e0a55df..04214c0 100644
> --- a/include/net/xfrm.h
> +++ b/include/net/xfrm.h
> @@ -1475,6 +1475,8 @@ extern int xfrm4_output(struct sk_buff *skb);
>  extern int xfrm4_output_finish(struct sk_buff *skb);
>  extern int xfrm4_tunnel_register(struct xfrm_tunnel *handler, unsigned short family);
>  extern int xfrm4_tunnel_deregister(struct xfrm_tunnel *handler, unsigned short family);
> +extern int xfrm4_mode_tunnel_input_register(struct xfrm_tunnel *handler);
> +extern int xfrm4_mode_tunnel_input_deregister(struct xfrm_tunnel *handler);
>  extern int xfrm6_extract_header(struct sk_buff *skb);
>  extern int xfrm6_extract_input(struct xfrm_state *x, struct sk_buff *skb);
>  extern int xfrm6_rcv_spi(struct sk_buff *skb, int nexthdr, __be32 spi);
> diff --git a/net/ipv4/xfrm4_mode_tunnel.c b/net/ipv4/xfrm4_mode_tunnel.c
> index ed4bf11..4fc2944 100644
> --- a/net/ipv4/xfrm4_mode_tunnel.c
> +++ b/net/ipv4/xfrm4_mode_tunnel.c
> @@ -15,6 +15,68 @@
>  #include <net/ip.h>
>  #include <net/xfrm.h>
>  
> +/*
> + * Informational hook. The decap is still done here.
> + */
> +static struct xfrm_tunnel __rcu *rcv_notify_handlers __read_mostly;
> +static DEFINE_MUTEX(xfrm4_mode_tunnel_input_mutex);
> +
> +int xfrm4_mode_tunnel_input_register(struct xfrm_tunnel *handler)
> +{
> +	struct xfrm_tunnel __rcu **pprev;
> +	struct xfrm_tunnel *t;
> +
> +	int ret = -EEXIST;
> +	int priority = handler->priority;
> +
> +	mutex_lock(&xfrm4_mode_tunnel_input_mutex);
> +
> +	for (pprev = &rcv_notify_handlers;
> +		(t = rcu_dereference_protected(*pprev,
> +		lockdep_is_held(&xfrm4_mode_tunnel_input_mutex))) != NULL;
> +		pprev = &t->next) {
> +		if (t->priority > priority)
> +			break;
> +		if (t->priority == priority)
> +			goto err;
> +
> +	}
> +
> +	handler->next = *pprev;
> +	rcu_assign_pointer(*pprev, handler);
> +
> +	ret = 0;
> +
> +err:
> +	mutex_unlock(&xfrm4_mode_tunnel_input_mutex);
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(xfrm4_mode_tunnel_input_register);
> +
> +int xfrm4_mode_tunnel_input_deregister(struct xfrm_tunnel *handler)
> +{
> +	struct xfrm_tunnel __rcu **pprev;
> +	struct xfrm_tunnel *t;
> +	int ret = -ENOENT;
> +
> +	mutex_lock(&xfrm4_mode_tunnel_input_mutex);
> +	for (pprev = &rcv_notify_handlers;
> +		(t = rcu_dereference_protected(*pprev,
> +		lockdep_is_held(&xfrm4_mode_tunnel_input_mutex))) != NULL;
> +		pprev = &t->next) {
> +		if (t == handler) {
> +			*pprev = handler->next;
> +			ret = 0;
> +			break;
> +		}
> +	}
> +	mutex_unlock(&xfrm4_mode_tunnel_input_mutex);
> +	synchronize_net();
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(xfrm4_mode_tunnel_input_deregister);
> +
>  static inline void ipip_ecn_decapsulate(struct sk_buff *skb)
>  {
>  	struct iphdr *inner_iph = ipip_hdr(skb);
> @@ -64,8 +126,14 @@ static int xfrm4_mode_tunnel_output(struct xfrm_state *x, struct sk_buff *skb)
>  	return 0;
>  }
>  
> +#define for_each_input_rcu(head, handler)	\
> +	for (handler = rcu_dereference(head);	\
> +		handler != NULL;		\
> +		handler = rcu_dereference(handler->next))  \
> +
>  static int xfrm4_mode_tunnel_input(struct xfrm_state *x, struct sk_buff *skb)
>  {
> +	struct xfrm_tunnel *handler;
>  	int err = -EINVAL;
>  
>  	if (XFRM_MODE_SKB_CB(skb)->protocol != IPPROTO_IPIP)
> @@ -74,6 +142,10 @@ static int xfrm4_mode_tunnel_input(struct xfrm_state *x, struct sk_buff *skb)
>  	if (!pskb_may_pull(skb, sizeof(struct iphdr)))
>  		goto out;
>  
> +	/* The handlers do not consume the skb. */
> +	for_each_input_rcu(rcv_notify_handlers, handler)
> +		handler->handler(skb);

I'm not sure if this is the right place to add your handler.
My understanding of an IPsec tunnel device would be to
receive the packet first and then do IPsec processing.
Here it happens the other way arround.

Why didn't you register a tunnel handler and call the
xfrm tunnel handler from that?

> +
>  	if (skb_cloned(skb) &&
>  	    (err = pskb_expand_head(skb, 0, 0, GFP_ATOMIC)))
>  		goto out;
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 2/5] drivers/net/ethernet/dec/tulip: Use standard __set_bit_le() function
From: Akinobu Mita @ 2012-06-14  9:36 UTC (permalink / raw)
  To: Takuya Yoshikawa
  Cc: Grant Grundler, Takuya Yoshikawa, akpm, bhutchings, grundler,
	arnd, benh, avi, mtosatti, linux-net-drivers, netdev,
	linux-kernel, linux-arch, kvm
In-Reply-To: <20120614072831.63845f6ddf024ece28e49f5e@gmail.com>

2012/6/14 Takuya Yoshikawa <takuya.yoshikawa@gmail.com>:
> On Wed, 13 Jun 2012 08:21:18 -0700
> Grant Grundler <grantgrundler@gmail.com> wrote:
>
>> >> >> Should this hash_table be converted from u16 hash_table[32] to
>> >> >> DECLARE_BITMAP(hash_table, 16 * 32) to ensure that it is aligned
>> >> >> on long-word boundary?
>> >> >
>> >> > I think hash_table is already long-word aligned because it is placed
>> >> > right after a pointer.
>> >>
>> >> I recommend converting to proper bitmap.  Because such an implicit
>> >> assumption is easily broken by someone touching this function.
>> >
>> > Do you mean something like:
>> >        DECLARE_BITMAP(__hash_table, 16 * 32);
>> >        u16 *hash_table = (u16 *)__hash_table;
>> > ?
>> >
>> > Grant, what do you think about this?
>>
>> Hi Takuya,
>> two thoughts:
>> 1) while I agree with Akinobu and thank him for pointing out a
>> _potential_ alignment problem, this is a separate issue and your
>> existing patch should go in anyway. There are probably other drivers
>> with _potential_ alignment issues. Akinobu could get credit for
>> finding them by submitting patches after reviewing calls to set_bit
>> and set_bit_le() - similar to what you are doing now.
>
> I prefer approach 1.
>
> hash_table is local in build_setup_frame_hash(), so if further
> improvement is also required, we can do that locally there later.

This potential alignment problem is introduced by this patch.  Because
the original set_bit_le() in tulip driver can handle unaligned bitmap.
This is why I recommended it should be fixed in this patch.

But please just ignore me if I'm too much paranoid.  And I'll handle
this issue if no one wants to do it.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox