Netdev List
 help / color / mirror / Atom feed
* Re: shutdown oops in xt_compat_calc_jump
From: Eric Dumazet @ 2011-04-05  6:24 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: dann frazier, netdev, netfilter-devel@vger.kernel.org
In-Reply-To: <1301957293.3021.191.camel@edumazet-laptop>

Le mardi 05 avril 2011 à 00:48 +0200, Eric Dumazet a écrit :
> Le lundi 04 avril 2011 à 22:37 +0200, Eric Dumazet a écrit :
> > Le lundi 04 avril 2011 à 22:02 +0200, Patrick McHardy a écrit :
> > > CCed netfilter-devel.
> > > 
> > > Am 04.04.2011 21:48, schrieb dann frazier:
> > > > fyi, noticed this oops when shutting down a system running top of git
> > > > (@ 78fca1be)
> > > > 
> > > > [ 1169.794644] cfg80211: Calling CRDA to update world regulatory domain
> > > > [ 1170.490646] bluetoothd[2029]: segfault at f8ad9944 ip 00000000f77045e0 sp 00000000ffcb14e0 error 4 in bluetoothd[f76bf000+8b000]
> > > > [ 1170.543817] BUG: unable to handle kernel paging request at 00000001dc1be9f8
> > > > [ 1170.543875] IP: [<ffffffffa051e7b0>] xt_compat_calc_jump+0x25/0x6f [x_tables]
> > > > [ 1170.543927] PGD 1215b3067 PUD 0 
> > > > [ 1170.543955] Oops: 0000 [#1] SMP 
> > > > [ 1170.543982] last sysfs file: /sys/module/bridge/initstate
> > > > [ 1170.544017] CPU 3 
> > > > [ 1170.544031] Modules linked in: ebtable_broute ebtable_filter vfat msdos fat ext3 jbd ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc acpi_cpufreq mperf cpufreq_powersave cpufreq_userspace cpufreq_conservative cpufreq_stats binfmt_misc kvm(-) fuse ext2 loop snd_hda_codec_hdmi snd_hda_codec_conexant arc4 ecb snd_usb_audio snd_usbmidi_lib snd_seq_midi snd_seq_midi_event snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_rawmidi i915 drm_kms_helper thinkpad_acpi snd_seq iwlagn snd_timer snd_seq_device drm snd mac80211 psmouse btusb serio_raw bluetooth evdev tpm_tis snd_page_alloc tpm i2c_i801 i2c_algo_bit cfg80211 battery soundcore nvram tpm_bios i2c_core rfkill wmi ac power_supply video button processor ext4 mbcache jbd2 crc16 sha256_generic aesni_intel cryptd aes_x86_64 aes_generic cbc dm_crypt dm_mod sd_mod crc_t10di
> > > f 
> > > >  usbhid
> > > > hid usb_storage ahci libahci libata ehci_hcd scsi_mod usbcore e1000e thermal thermal_sys [last unloaded: kvm_intel]
> > > > [ 1170.544836] 
> > > > [ 1170.544849] Pid: 4901, comm: ebtables Not tainted 2.6.39-rc1+ #9 LENOVO 2516CTO/2516CTO
> > > > [ 1170.544902] RIP: 0010:[<ffffffffa051e7b0>]  [<ffffffffa051e7b0>] xt_compat_calc_jump+0x25/0x6f [x_tables]
> > > > [ 1170.544958] RSP: 0018:ffff880121473cf8  EFLAGS: 00010217
> > > > [ 1170.544989] RAX: 000000003b837d3f RBX: 0000000000000090 RCX: 000000007706fa7f
> > > > [ 1170.545029] RDX: 0000000000000000 RSI: 0000000000000090 RDI: 000000003b837d3f
> > > > [ 1170.545067] RBP: ffffc900111a3000 R08: 0000000000000000 R09: dead000000200200
> > > > [ 1170.545104] R10: dead000000100100 R11: 0000000000001311 R12: ffff880121473d88
> > > > [ 1170.545147] R13: ffffc900111a6000 R14: ffffffff817de300 R15: 0000000000000000
> > > > [ 1170.545185] FS:  0000000000000000(0000) GS:ffff880137d80000(0063) knlGS:00000000f761b6c0
> > > > [ 1170.545227] CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
> > > > [ 1170.545258] CR2: 00000001dc1be9f8 CR3: 0000000125868000 CR4: 00000000000006e0
> > > > [ 1170.545297] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > [ 1170.545334] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > > > [ 1170.545375] Process ebtables (pid: 4901, threadinfo ffff880121472000, task ffff8801322d1ac0)
> > > > [ 1170.545418] Stack:
> > > > [ 1170.545433]  0000000000000090 ffffffffa0576d46 f7007265746c6966 0000000000000054
> > > > [ 1170.545479]  0000000000000000 0000000000000000 000000000000000e 0000000000000090
> > > > [ 1170.545529]  0000000000000000 0000000008af2180 0000000008af21b0 0000000008af21e0
> > > > [ 1170.545579] Call Trace:
> > > > [ 1170.545600]  [<ffffffffa0576d46>] ? compat_do_replace+0x117/0x221 [ebtables]
> > > > [ 1170.545639]  [<ffffffffa0577392>] ? compat_do_ebt_set_ctl+0x55/0xbb [ebtables]
> > > > [ 1170.545688]  [<ffffffff810337e3>] ? need_resched+0x1a/0x23
> > > > [ 1170.545723]  [<ffffffff810337f1>] ? should_resched+0x5/0x24
> > > > [ 1170.545730]  [<ffffffff81314cc5>] ? _cond_resched+0x9/0x20
> > > > [ 1170.545733]  [<ffffffff813152fe>] ? mutex_lock_interruptible+0x18/0x32
> > > > [ 1170.545738]  [<ffffffff8128490b>] ? nf_sockopt_find.clone.1+0xda/0xec
> > > > [ 1170.545742]  [<ffffffff81284996>] ? compat_nf_sockopt+0x79/0xa5
> > > > [ 1170.545744]  [<ffffffff810337f1>] ? should_resched+0x5/0x24
> > > > [ 1170.545747]  [<ffffffff812849f3>] ? compat_nf_setsockopt+0x1a/0x1f
> > > > [ 1170.545751]  [<ffffffff8128fb35>] ? compat_ip_setsockopt+0x80/0xa0
> > > > [ 1170.545756]  [<ffffffff812784a2>] ? compat_sys_setsockopt+0x1d5/0x204
> > > > [ 1170.545759]  [<ffffffff810337f1>] ? should_resched+0x5/0x24
> > > > [ 1170.545761]  [<ffffffff81314cc5>] ? _cond_resched+0x9/0x20
> > > > [ 1170.545764]  [<ffffffff812788a5>] ? compat_sys_socketcall+0x148/0x1a7
> > > > [ 1170.545768]  [<ffffffff8131d2c0>] ? sysenter_dispatch+0x7/0x2e
> > > > [ 1170.545769] Code: 5d 41 5e 41 5f c3 40 0f b6 ff 53 31 d2 48 6b ff 70 48 03 3d 03 1b 00 00 8b 4f 6c 4c 8b 47 60 ff c9 eb 27 8d 04 11 d1 f8 48 63 f8 
> > > > [ 1170.545787] RIP  [<ffffffffa051e7b0>] xt_compat_calc_jump+0x25/0x6f [x_tables]
> > > > [ 1170.545792]  RSP <ffff880121473cf8>
> > > > [ 1170.545794] CR2: 00000001dc1be9f8
> > > > [ 1170.654269] ---[ end trace d44667d90dcbd115 ]---
> > > > [ 1170.662411] fuse exit
> > > > Kernel logging (proc) stopped.
> > > > --
> > 
> > 
> > Hmm, commit 255d0dc34068a976550ce555e must have a problem for ebtables ?
> > 
> > Dann, could you give us what you do with ebtables ?
> > 
> > Thanks
> > 
> 
> For sure, there was a typo in above commit, but this is not enough to
> make ebtables work in COMPAT mode.
> 
> Hmm...
> 

Update : xt_compat_calc_jump() misses this bit, and I still have to find
the ebtables problem.

I'll provide a cumulative patch once done

diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index a9adf4c..1acda09 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -473,6 +473,8 @@ int xt_compat_calc_jump(u_int8_t af, unsigned int offset)
 		else
 			return mid ? tmp[mid - 1].delta : 0;
 	}
+	if (left)
+		return tmp[left - 1].delta;
 	WARN_ON_ONCE(1);
 	return 0;
 }


--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* RE: Netxen packet loss with VLANs and LRO (was: [PATCH] netxen: fix LRO disable warning)
From: Amit Salecha @ 2011-04-05  5:38 UTC (permalink / raw)
  To: Marc Haber
  Cc: davem@davemloft.net, netdev@vger.kernel.org, Ameen Rahman,
	Rajesh Borundia
In-Reply-To: <20110403190555.GA16196@torres.zugschlus.de>

>
> Hi,
>
> On Mon, Mar 21, 2011 at 03:37:08AM -0700, Amit Kumar Salecha wrote:
> > netxen_nic_set_flags() rejects data if other flag than ETH_FLAG_LRO
> is set.
> > Driver also supports NETIF_F_HW_VLAN_TX.
> > Now compare data with ethtool_op_get_flags(), to get all supported
> features.
>
> Could that be the cause for packet loss on kernel 2.6.38.2 if:
>
>   - receiving card is NX3031 [4040:0100]
>   - frames are received with VLAN tags
>   - large received offload is on.
>

If ip_forwarding or routing is enable ....then you may see packet loss.

> Packet Loss of this kind is noticed when doing TCP data transfers
> towards the host with the Netxen Interface and the TCP session is
> terminated on the Netxen host itself. TCP sessions routed through the
> Netxen host are not affected.
>
> My ethtool doesn't allow me to influence the LRO setting alone - it is
> disabled when I set rx off but doesn't come on again when rx is set to
> on again. So, ethtool -K rx off, ethtool -K rx on fixes the issue.
>
If rx csum is disabled, LRO will be disable. LRO won't be enabled automatically if you enable rx csum.
You need to explicitly enable LRO.

> Is this a known bug, maybe with an available patch?
>
You need to retest with this patch http://patchwork.ozlabs.org/patch/88060/. This patch got applied instead of mine.


This message and any attached documents contain information from QLogic Corporation or its wholly-owned subsidiaries that may be confidential. If you are not the intended recipient, you may not read, copy, distribute, or use this information. If you have received this transmission in error, please notify the sender immediately by reply e-mail and then delete this message.

^ permalink raw reply

* [GIT] Networking
From: David Miller @ 2011-04-05  5:27 UTC (permalink / raw)
  To: torvalds; +Cc: akpm, netdev, linux-kernel


1) TCP ipv6 mis-interprets error pointer, from Boris Ostrovsky.

2) Ease a BUG() into a warning in TCP, from Ilpo Järvinen.

3) SCTP under-allocates asconf-ack chunks, from Wei Yongjun.

4) fix dev_ethtool_get_rx_csum() NETIF_F_RXCSUM handling, from
   Michał Mirosław.

5) Add some device IDs to rt2x00, from Xose Vazquez Perez.

6) starfire dma_addr_t size test no longer needs to be a mess, from
   FUJITA Tomonori.

Please pull, thanks a lot!

The following changes since commit d7c764c4c7b782c660b4600b0bff2e3509892a4d:

  Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip (2011-04-04 08:37:45 -0700)

are available in the git repository at:

  master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6.git master

Andrei Emeltchenko (1):
      Bluetooth: delete hanging L2CAP channel

Arnd Bergmann (1):
      usbnet: use eth%d name for known ethernet devices

Boris Ostrovsky (1):
      ipv6: Don't pass invalid dst_entry pointer to dst_release().

Christian Lamparter (1):
      carl9170: Fix tx aggregation problems with some clients

Daniel Halperin (1):
      mac80211: fix aggregation frame release during timeout

David S. Miller (1):
      Merge branch 'master' of git://git.kernel.org/.../linville/wireless-2.6

FUJITA Tomonori (1):
      starfire: clean up dma_addr_t size test

Felix Fietkau (2):
      mac80211: fix a crash in minstrel_ht in HT mode with no supported MCS rates
      ath9k: fix a chip wakeup related crash in ath9k_start

Gustavo F. Padovan (1):
      Bluetooth: Fix HCI_RESET command synchronization

Ilpo Järvinen (1):
      tcp: len check is unnecessarily devastating, change to WARN_ON

Johan Hedberg (1):
      Bluetooth: Fix missing hci_dev_lock_bh in user_confirm_reply

Johannes Berg (1):
      iwlegacy: fix bugs in change_interface

Juuso Oikarinen (1):
      cfg80211: fix BSS double-unlinking (continued)

Marc-Antoine Perennou (1):
      Bluetooth: add support for Apple MacBook Pro 8,2

Mariusz Kozlowski (3):
      mac80211: fix possible NULL pointer dereference
      cfg80211:: fix possible NULL pointer dereference
      mlx4: fix kfree on error path in new_steering_entry()

Michał Mirosław (1):
      net: Fix dev dev_ethtool_get_rx_csum() for forced NETIF_F_RXCSUM

Petr Štetiar (1):
      mac80211: fix NULL pointer dereference in ieee80211_key_alloc()

Stanislaw Gruszka (2):
      iwl3945: do not deprecate software scan
      iwl3945: disable hw scan by default

Suraj Sumangala (1):
      Bluetooth: Increment unacked_frames count only the first transmit

Thomas Gleixner (1):
      Bluetooth: Fix warning with hci_cmd_timer

Vinicius Costa Gomes (1):
      Bluetooth: Fix sending LE data over USB

Wei Yongjun (2):
      sctp: fix auth_hmacs field's length of struct sctp_cookie
      sctp: malloc enough room for asconf-ack chunk

Xose Vazquez Perez (1):
      wireless: rt2x00: rt2800usb.c add and identify ids

 drivers/bluetooth/btusb.c                    |    6 ++++-
 drivers/net/mlx4/mcg.c                       |    4 +-
 drivers/net/starfire.c                       |    6 +----
 drivers/net/usb/cdc_eem.c                    |    2 +-
 drivers/net/usb/cdc_ether.c                  |    2 +-
 drivers/net/usb/cdc_ncm.c                    |    2 +-
 drivers/net/usb/cdc_subset.c                 |    8 ++++++
 drivers/net/usb/gl620a.c                     |    2 +-
 drivers/net/usb/net1080.c                    |    2 +-
 drivers/net/usb/plusb.c                      |    2 +-
 drivers/net/usb/rndis_host.c                 |    2 +-
 drivers/net/usb/usbnet.c                     |    3 +-
 drivers/net/usb/zaurus.c                     |    8 +++---
 drivers/net/wireless/ath/ath9k/main.c        |    4 +++
 drivers/net/wireless/ath/carl9170/carl9170.h |    1 +
 drivers/net/wireless/ath/carl9170/main.c     |    1 +
 drivers/net/wireless/ath/carl9170/tx.c       |    7 ++++++
 drivers/net/wireless/iwlegacy/iwl-core.c     |   10 ++++++++
 drivers/net/wireless/iwlegacy/iwl3945-base.c |    7 ++---
 drivers/net/wireless/rt2x00/rt2800usb.c      |   10 ++++++--
 include/linux/netdevice.h                    |    4 +-
 include/linux/usb/usbnet.h                   |    2 +
 include/net/bluetooth/hci.h                  |    2 +
 include/net/sctp/structs.h                   |    2 +-
 net/bluetooth/hci_core.c                     |   10 ++++++-
 net/bluetooth/hci_event.c                    |    4 ++-
 net/bluetooth/l2cap_core.c                   |    4 ++-
 net/bluetooth/l2cap_sock.c                   |    5 ++-
 net/bluetooth/mgmt.c                         |    2 +
 net/ipv4/tcp_output.c                        |    3 +-
 net/ipv6/tcp_ipv6.c                          |    1 +
 net/mac80211/key.c                           |    7 +++--
 net/mac80211/rc80211_minstrel_ht.c           |   25 +++++++++++++++------
 net/mac80211/rx.c                            |    3 +-
 net/sctp/sm_make_chunk.c                     |    4 +-
 net/wireless/scan.c                          |   31 +++++++++++++++++--------
 36 files changed, 138 insertions(+), 60 deletions(-)

^ permalink raw reply

* [PATCH v4] net: Allow no-cache copy from user on transmit
From: Tom Herbert @ 2011-04-05  4:03 UTC (permalink / raw)
  To: davem, netdev

This patch uses __copy_from_user_nocache on transmit to bypass data
cache for a performance improvement.  skb_add_data_nocache and
skb_copy_to_page_nocache can be called by sendmsg functions to use
this feature, initial support is in tcp_sendmsg.  This functionality is
configurable per device using ethtool.

Presumably, this feature would only be useful when the driver does
not touch the data.  The feature is turned on by default if a device
indicates that it does some form of checksum offload; it is off by
default for devices that do no checksum offload or indicate no checksum
is necessary.  For the former case copy-checksum is probably done
anyway, in the latter case the device is likely loopback in which case
the no cache copy is probably not beneficial.

This patch was tested using 200 instances of netperf TCP_RR with
1400 byte request and one byte reply.  Platform is 16 core AMD x86.

No-cache copy disabled:
   672703 tps, 97.13% utilization
   50/90/99% latency:244.31 484.205 1028.41

No-cache copy enabled:
   702113 tps, 96.16% utilization,
   50/90/99% latency 238.56 467.56 956.955

Using 14000 byte request and response sizes demonstrate the
effects more dramatically:

No-cache copy disabled:
   79571 tps, 34.34 %utlization
   50/90/95% latency 1584.46 2319.59 5001.76

No-cache copy enabled:
   83856 tps, 34.81% utilization
   50/90/95% latency 2508.42 2622.62 2735.88

Note especially the effect on latency tail (95th percentile).

This seems to provide a nice performance improvement and is
consistent in the tests I ran.  Presumably, this would provide
the greatest benfits in the presence of an application workload
stressing the cache and a lot of transmit data happening.

Signed-off-by: Tom Herbert <therbert@google.com>
---
 drivers/net/bonding/bond_main.c |    2 +-
 include/linux/netdevice.h       |    3 +-
 include/net/sock.h              |   53 +++++++++++++++++++++++++++++++++++++++
 net/core/dev.c                  |   12 +++++++++
 net/core/ethtool.c              |    2 +-
 net/ipv4/tcp.c                  |    7 +++--
 6 files changed, 73 insertions(+), 6 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 16d6fe9..b51e021 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1407,7 +1407,7 @@ static int bond_compute_features(struct bonding *bond)
 	int i;
 
 	features &= ~(NETIF_F_ALL_CSUM | BOND_VLAN_FEATURES);
-	features |=  NETIF_F_GSO_MASK | NETIF_F_NO_CSUM;
+	features |=  NETIF_F_GSO_MASK | NETIF_F_NO_CSUM | NETIF_F_NOCACHE_COPY;
 
 	if (!bond->first_slave)
 		goto done;
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 423a544..1828119 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1066,6 +1066,7 @@ struct net_device {
 #define NETIF_F_NTUPLE		(1 << 27) /* N-tuple filters supported */
 #define NETIF_F_RXHASH		(1 << 28) /* Receive hashing offload */
 #define NETIF_F_RXCSUM		(1 << 29) /* Receive checksumming offload */
+#define NETIF_F_NOCACHE_COPY	(1 << 30) /* Use no-cache copyfromuser */
 
 	/* Segmentation offload features */
 #define NETIF_F_GSO_SHIFT	16
@@ -1081,7 +1082,7 @@ struct net_device {
 	/* = all defined minus driver/device-class-related */
 #define NETIF_F_NEVER_CHANGE	(NETIF_F_HIGHDMA | NETIF_F_VLAN_CHALLENGED | \
 				  NETIF_F_LLTX | NETIF_F_NETNS_LOCAL)
-#define NETIF_F_ETHTOOL_BITS	(0x3f3fffff & ~NETIF_F_NEVER_CHANGE)
+#define NETIF_F_ETHTOOL_BITS	(0x7f3fffff & ~NETIF_F_NEVER_CHANGE)
 
 	/* List of features with software fallbacks. */
 #define NETIF_F_GSO_SOFTWARE	(NETIF_F_TSO | NETIF_F_TSO_ECN | \
diff --git a/include/net/sock.h b/include/net/sock.h
index da0534d..43bd515 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -52,6 +52,7 @@
 #include <linux/mm.h>
 #include <linux/security.h>
 #include <linux/slab.h>
+#include <linux/uaccess.h>
 
 #include <linux/filter.h>
 #include <linux/rculist_nulls.h>
@@ -1389,6 +1390,58 @@ static inline void sk_nocaps_add(struct sock *sk, int flags)
 	sk->sk_route_caps &= ~flags;
 }
 
+static inline int skb_do_copy_data_nocache(struct sock *sk, struct sk_buff *skb,
+					   char __user *from, char *to,
+					   int copy)
+{
+	if (skb->ip_summed == CHECKSUM_NONE) {
+		int err = 0;
+		__wsum csum = csum_and_copy_from_user(from, to, copy, 0, &err);
+		if (err)
+			return err;
+		skb->csum = csum_block_add(skb->csum, csum, skb->len);
+	} else if (sk->sk_route_caps & NETIF_F_NOCACHE_COPY) {
+		if (!access_ok(VERIFY_READ, from, copy) ||
+		    __copy_from_user_nocache(to, from, copy))
+			return -EFAULT;
+	} else if (copy_from_user(to, from, copy))
+		return -EFAULT;
+
+	return 0;
+}
+
+static inline int skb_add_data_nocache(struct sock *sk, struct sk_buff *skb,
+				       char __user *from, int copy)
+{
+	int err;
+
+	err = skb_do_copy_data_nocache(sk, skb, from, skb_put(skb, copy), copy);
+	if (err)
+		__skb_trim(skb, skb->len);
+
+	return err;
+}
+
+static inline int skb_copy_to_page_nocache(struct sock *sk, char __user *from,
+					   struct sk_buff *skb,
+					   struct page *page,
+					   int off, int copy)
+{
+	int err;
+
+	err = skb_do_copy_data_nocache(sk, skb, from,
+				       page_address(page) + off, copy);
+	if (err)
+		return err;
+
+	skb->len	     += copy;
+	skb->data_len	     += copy;
+	skb->truesize	     += copy;
+	sk->sk_wmem_queued   += copy;
+	sk_mem_charge(sk, copy);
+	return 0;
+}
+
 static inline int skb_copy_to_page(struct sock *sk, char __user *from,
 				   struct sk_buff *skb, struct page *page,
 				   int off, int copy)
diff --git a/net/core/dev.c b/net/core/dev.c
index 02f5637..5d0b4f6 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5425,6 +5425,14 @@ int register_netdevice(struct net_device *dev)
 		dev->features &= ~NETIF_F_GSO;
 	}
 
+	/* Turn on no cache copy if HW is doing checksum */
+	dev->hw_features |= NETIF_F_NOCACHE_COPY;
+	if ((dev->features & NETIF_F_ALL_CSUM) &&
+	    !(dev->features & NETIF_F_NO_CSUM)) {
+		dev->wanted_features |= NETIF_F_NOCACHE_COPY;
+		dev->features |= NETIF_F_NOCACHE_COPY;
+	}
+
 	/* Enable GRO and NETIF_F_HIGHDMA for vlans by default,
 	 * vlan_dev_init() will do the dev->features check, so these features
 	 * are enabled only if supported by underlying device.
@@ -6182,6 +6190,10 @@ u32 netdev_increment_features(u32 all, u32 one, u32 mask)
 		}
 	}
 
+	/* If device can't no cache copy, don't do for all */
+	if (!(one & NETIF_F_NOCACHE_COPY))
+		all &= ~NETIF_F_NOCACHE_COPY;
+
 	one |= NETIF_F_ALL_CSUM;
 
 	one |= all & NETIF_F_ONE_FOR_ALL;
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 439e4b0..719670a 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -359,7 +359,7 @@ static const char netdev_features_strings[ETHTOOL_DEV_FEATURE_WORDS * 32][ETH_GS
 	/* NETIF_F_NTUPLE */          "rx-ntuple-filter",
 	/* NETIF_F_RXHASH */          "rx-hashing",
 	/* NETIF_F_RXCSUM */          "rx-checksum",
-	"",
+	/* NETIF_F_NOCACHE_COPY */    "tx-nocache-copy"
 	"",
 };
 
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index b22d450..054a59d 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -999,7 +999,8 @@ new_segment:
 				/* We have some space in skb head. Superb! */
 				if (copy > skb_tailroom(skb))
 					copy = skb_tailroom(skb);
-				if ((err = skb_add_data(skb, from, copy)) != 0)
+				err = skb_add_data_nocache(sk, skb, from, copy);
+				if (err)
 					goto do_fault;
 			} else {
 				int merge = 0;
@@ -1042,8 +1043,8 @@ new_segment:
 
 				/* Time to copy data. We are close to
 				 * the end! */
-				err = skb_copy_to_page(sk, from, skb, page,
-						       off, copy);
+				err = skb_copy_to_page_nocache(sk, from, skb,
+							       page, off, copy);
 				if (err) {
 					/* If this page was new, give it to the
 					 * socket so it does not get leaked.
-- 
1.7.3.1


^ permalink raw reply related

* [PATCH] IPVS: combine consecutive #ifdef CONFIG_PROC_FS blocks
From: Simon Horman @ 2011-04-05  3:22 UTC (permalink / raw)
  To: lvs-devel, netdev, netfilter-devel, netfilter
  Cc: Wensong Zhang, Julian Anastasov, Patrick McHardy, Simon Horman
In-Reply-To: <1301973732-8989-1-git-send-email-horms@verge.net.au>

Signed-off-by: Simon Horman <horms@verge.net.au>
---
 net/netfilter/ipvs/ip_vs_ctl.c |    3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index 33733c8..36f4495 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -1984,9 +1984,6 @@ static const struct file_operations ip_vs_info_fops = {
 	.release = seq_release_private,
 };
 
-#endif

^ permalink raw reply related

* [GIT PULL nf-next-2.6] IPVS
From: Simon Horman @ 2011-04-05  3:22 UTC (permalink / raw)
  To: lvs-devel, netdev, netfilter-devel, netfilter
  Cc: Wensong Zhang, Julian Anastasov, Patrick McHardy

Hi Patrick,

please consider pulling
git://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next-2.6.git master
to get the following minor change from myself. This pull request is
based on nf-next-2.6.

Simon Horman (1):
      IPVS: combine consecutive #ifdef CONFIG_PROC_FS blocks

 net/netfilter/ipvs/ip_vs_ctl.c |    3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)


^ permalink raw reply

* Re: [PATCH 07/19] timberdale: mfd_cell is now implicitly available to drivers
From: Grant Likely @ 2011-04-05  3:04 UTC (permalink / raw)
  To: Samuel Ortiz
  Cc: Andres Salomon, linux-kernel, Mark Brown, khali, ben-linux,
	Peter Korsgaard, Mauro Carvalho Chehab, David Brownell, linux-i2c,
	linux-media, netdev, spi-devel-general, Mocean Laboratories,
	Greg Kroah-Hartman
In-Reply-To: <20110404100314.GC2751@sortiz-mobl>

On Mon, Apr 04, 2011 at 12:03:15PM +0200, Samuel Ortiz wrote:
> On Fri, Apr 01, 2011 at 05:58:44PM -0600, Grant Likely wrote:
> > On Fri, Apr 1, 2011 at 5:52 PM, Samuel Ortiz <sameo@linux.intel.com> wrote:
> > > On Fri, Apr 01, 2011 at 11:56:35AM -0600, Grant Likely wrote:
> > >> On Fri, Apr 1, 2011 at 11:47 AM, Andres Salomon <dilinger@queued.net> wrote:
> > >> > On Fri, 1 Apr 2011 13:20:31 +0200
> > >> > Samuel Ortiz <sameo@linux.intel.com> wrote:
> > >> >
> > >> >> Hi Grant,
> > >> >>
> > >> >> On Thu, Mar 31, 2011 at 05:05:22PM -0600, Grant Likely wrote:
> > >> > [...]
> > >> >> > Gah.  Not all devices instantiated via mfd will be an mfd device,
> > >> >> > which means that the driver may very well expect an *entirely
> > >> >> > different* platform_device pointer; which further means a very high
> > >> >> > potential of incorrectly dereferenced structures (as evidenced by a
> > >> >> > patch series that is not bisectable).  For instance, the xilinx ip
> > >> >> > cores are used by more than just mfd.
> > >> >> I agree. Since the vast majority of the MFD subdevices are MFD
> > >> >> specific IPs, I overlooked that part. The impacted drivers are the
> > >> >> timberdale and the DaVinci voice codec ones.
> > >>
> > >> Another option is you could do this for MFD devices:
> > >>
> > >> struct mfd_device {
> > >>         struct platform_devce pdev;
> > >>         struct mfd_cell *cell;
> > >> };
> > >>
> > >> However, that requires that drivers using the mfd_cell will *never*
> > >> get instantiated outside of the mfd infrastructure, and there is no
> > >> way to protect against this so it is probably a bad idea.
> > >>
> > >> Or, mfd_cell could be added to platform_device directly which would
> > >> *by far* be the safest option at the cost of every platform_device
> > >> having a mostly unused mfd_cell pointer.  Not a significant cost in my
> > >> opinion.
> > > I thought about this one, but I had the impression people would want to kill
> > > me for adding an MFD specific pointer to platform_device. I guess it's worth
> > > giving it a try since it would be a simple and safe solution.
> > > I'll look at it later this weekend.
> > >
> > > Thanks for the input.
> > 
> > [cc'ing gregkh because we're talking about modifying struct platform_device]
> > 
> > I'll back you up on this one.  It is a far better solution than the
> > alternatives.  At least with mfd, it covers a large set of devices.  I
> > think there is a strong argument for doing this.  Or alternatively,
> > the particular interesting fields from mfd_cell could be added to
> > platform_device.  What information do child devices need access to?
> In some cases, they need the whole cell to clone it. So I'm up for adding an
> mfd_cell pointer to the platform_device structure.
> Below is a tentative patch. This is a first step and would fix all
> regressions. I tried to keep the MFD dependencies as small as possible, which
> is why I placed the pdev->mfd_cell building code in mfd-core.c

Okay.

> The second step would be to get rid of mfd_get_data() and have all subdrivers
> going back to the regular platform_data way. They would no longer be dependant
> on the MFD code except for those who really need it. In that case they could
> just call mfd_get_cell() and get full access to their MFD cell.

The revert to platform_data needs to happen ASAP though.  If this
second step isn't ready really quickly, then the current patches
should be reverted to give some breathing room for creating the
replacement patches.  However, it's not such a rush if the below
patch really does eliminate all of the nastiness of the original
series. (I haven't looked and a rolled up diff of the first series and
this change, so I don't know for sure).

In principle I agree with this patch.  Some comments below.

g.

> 
> --- 
>  drivers/mfd/mfd-core.c          |   27 ++++++++++++++++++++++-----
>  include/linux/mfd/core.h        |    7 +++++--
>  include/linux/platform_device.h |    5 +++++
>  3 files changed, 32 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/mfd/mfd-core.c b/drivers/mfd/mfd-core.c
> index d01574d..c0fc1c0 100644
> --- a/drivers/mfd/mfd-core.c
> +++ b/drivers/mfd/mfd-core.c
> @@ -18,6 +18,21 @@
>  #include <linux/pm_runtime.h>
>  #include <linux/slab.h>
>  
> +static int mfd_platform_add_cell(struct platform_device *pdev, const struct mfd_cell *cell)
> +{
> +	struct mfd_cell *c;
> +
> +	if (cell == NULL)
> +		return 0;
> +
> +	c = kmemdup(cell, sizeof(struct mfd_cell), GFP_KERNEL);
> +	if (c == NULL)
> +		return -ENOMEM;
> +
> +	pdev->mfd_cell = c;
> +	return 0;
> +}

'sizeof(*cell) is a teensy bit safer.  Also, this can be more concise:

static int mfd_platform_add_cell(struct platform_device *pdev,
				 const struct mfd_cell *cell)
{
	if (!cell)
		return 0;

	pdev->mfd_cell = kmemdup(cell, sizeof(*cell), GFP_KERNEL);
	return pdev->mfd_cell ? 0 : -ENOMEM;
}

> +
>  int mfd_cell_enable(struct platform_device *pdev)
>  {
>  	const struct mfd_cell *cell = mfd_get_cell(pdev);
> @@ -75,7 +90,7 @@ static int mfd_add_device(struct device *parent, int id,
>  
>  	pdev->dev.parent = parent;
>  
> -	ret = platform_device_add_data(pdev, cell, sizeof(*cell));
> +	ret = mfd_platform_add_cell(pdev, cell);
>  	if (ret)
>  		goto fail_res;
>  
> @@ -104,17 +119,17 @@ static int mfd_add_device(struct device *parent, int id,
>  		if (!cell->ignore_resource_conflicts) {
>  			ret = acpi_check_resource_conflict(res);
>  			if (ret)
> -				goto fail_res;
> +				goto fail_cell;
>  		}
>  	}
>  
>  	ret = platform_device_add_resources(pdev, res, cell->num_resources);
>  	if (ret)
> -		goto fail_res;
> +		goto fail_cell;
>  
>  	ret = platform_device_add(pdev);
>  	if (ret)
> -		goto fail_res;
> +		goto fail_cell;
>  
>  	if (cell->pm_runtime_no_callbacks)
>  		pm_runtime_no_callbacks(&pdev->dev);
> @@ -123,7 +138,8 @@ static int mfd_add_device(struct device *parent, int id,
>  
>  	return 0;
>  
> -/*	platform_device_del(pdev); */
> +fail_cell:
> +	kfree(pdev->mfd_cell);

Looks like kfreeing the cell should become part of the
platform_device_release() function.  Which would remove it from here,
and also ...

>  fail_res:
>  	kfree(res);
>  fail_device:
> @@ -171,6 +187,7 @@ static int mfd_remove_devices_fn(struct device *dev, void *c)
>  	if (!*usage_count || (cell->usage_count < *usage_count))
>  		*usage_count = cell->usage_count;
>  
> +	kfree(pdev->mfd_cell);

... from here.

>  	platform_device_unregister(pdev);
>  	return 0;
>  }
> diff --git a/include/linux/mfd/core.h b/include/linux/mfd/core.h
> index ad1b19a..0e4d3a6 100644
> --- a/include/linux/mfd/core.h
> +++ b/include/linux/mfd/core.h
> @@ -86,7 +86,7 @@ extern int mfd_clone_cell(const char *cell, const char **clones,
>   */
>  static inline const struct mfd_cell *mfd_get_cell(struct platform_device *pdev)
>  {
> -	return pdev->dev.platform_data;
> +	return pdev->mfd_cell;
>  }
>  
>  /*
> @@ -95,7 +95,10 @@ static inline const struct mfd_cell *mfd_get_cell(struct platform_device *pdev)
>   */
>  static inline void *mfd_get_data(struct platform_device *pdev)
>  {
> -	return mfd_get_cell(pdev)->mfd_data;
> +	if (pdev->mfd_cell != NULL)
> +		return mfd_get_cell(pdev)->mfd_data;
> +	else
> +		return pdev->dev.platform_data;

Blech!  Yeah, this should become consistent that platform data
*always* comes from pdev->dev.platform_data.

>  }
>  
>  extern int mfd_add_devices(struct device *parent, int id,
> diff --git a/include/linux/platform_device.h b/include/linux/platform_device.h
> index d96db98..734d254 100644
> --- a/include/linux/platform_device.h
> +++ b/include/linux/platform_device.h
> @@ -14,6 +14,8 @@
>  #include <linux/device.h>
>  #include <linux/mod_devicetable.h>
>  
> +struct mfd_cell;
> +
>  struct platform_device {
>  	const char	* name;
>  	int		id;
> @@ -23,6 +25,9 @@ struct platform_device {
>  
>  	const struct platform_device_id	*id_entry;
>  
> +	/* MFD cell pointer */
> +	struct mfd_cell	*mfd_cell;
> +

Move this down to by the of_node pointer.  May as well collect all the
supplemental data about the device in the same place.

>  	/* arch specific additions */
>  	struct pdev_archdata	archdata;
>  };
> 
> -- 
> Intel Open Source Technology Centre
> http://oss.intel.com/

^ permalink raw reply

* [PATCH 4/8] s2io: convert to set_phys_id (v2)
From: Stephen Hemminger @ 2011-04-05  1:09 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: David S. Miller, Jon Mason, netdev
In-Reply-To: <1301959036.2935.58.camel@localhost>

Convert to new ethtool set physical id model. Remove no longer used
timer, and fix docbook comment.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

---
v2 - store last gpio value in unused part of s2io_nic

--- a/drivers/net/s2io.c	2011-04-04 18:02:15.842923454 -0700
+++ b/drivers/net/s2io.c	2011-04-04 18:06:53.070404512 -0700
@@ -5484,83 +5484,79 @@ static void s2io_ethtool_gregs(struct ne
 	}
 }
 
-/**
- *  s2io_phy_id  - timer function that alternates adapter LED.
- *  @data : address of the private member of the device structure, which
- *  is a pointer to the s2io_nic structure, provided as an u32.
- * Description: This is actually the timer function that alternates the
- * adapter LED bit of the adapter control bit to set/reset every time on
- * invocation. The timer is set for 1/2 a second, hence tha NIC blinks
- *  once every second.
+/*
+ *  s2io_set_led - control NIC led
  */
-static void s2io_phy_id(unsigned long data)
+static void s2io_set_led(struct s2io_nic *sp, bool on)
 {
-	struct s2io_nic *sp = (struct s2io_nic *)data;
 	struct XENA_dev_config __iomem *bar0 = sp->bar0;
-	u64 val64 = 0;
-	u16 subid;
+	u16 subid = sp->pdev->subsystem_device;
+	u64 val64;
 
-	subid = sp->pdev->subsystem_device;
 	if ((sp->device_type == XFRAME_II_DEVICE) ||
 	    ((subid & 0xFF) >= 0x07)) {
 		val64 = readq(&bar0->gpio_control);
-		val64 ^= GPIO_CTRL_GPIO_0;
+		if (on)
+			val64 |= GPIO_CTRL_GPIO_0;
+		else
+			val64 &= ~GPIO_CTRL_GPIO_0;
+
 		writeq(val64, &bar0->gpio_control);
 	} else {
 		val64 = readq(&bar0->adapter_control);
-		val64 ^= ADAPTER_LED_ON;
+		if (on)
+			val64 |= ADAPTER_LED_ON;
+		else
+			val64 &= ~ADAPTER_LED_ON;
+
 		writeq(val64, &bar0->adapter_control);
 	}
 
-	mod_timer(&sp->id_timer, jiffies + HZ / 2);
 }
 
 /**
- * s2io_ethtool_idnic - To physically identify the nic on the system.
- * @sp : private member of the device structure, which is a pointer to the
- * s2io_nic structure.
- * @id : pointer to the structure with identification parameters given by
- * ethtool.
+ * s2io_ethtool_set_led - To physically identify the nic on the system.
+ * @dev : network device
+ * @state: led setting
+ *
  * Description: Used to physically identify the NIC on the system.
  * The Link LED will blink for a time specified by the user for
  * identification.
  * NOTE: The Link has to be Up to be able to blink the LED. Hence
  * identification is possible only if it's link is up.
- * Return value:
- * int , returns 0 on success
  */
 
-static int s2io_ethtool_idnic(struct net_device *dev, u32 data)
+static int s2io_ethtool_set_led(struct net_device *dev,
+				enum ethtool_phys_id_state state)
 {
-	u64 val64 = 0, last_gpio_ctrl_val;
 	struct s2io_nic *sp = netdev_priv(dev);
 	struct XENA_dev_config __iomem *bar0 = sp->bar0;
-	u16 subid;
+	u16 subid = sp->pdev->subsystem_device;
 
-	subid = sp->pdev->subsystem_device;
-	last_gpio_ctrl_val = readq(&bar0->gpio_control);
 	if ((sp->device_type == XFRAME_I_DEVICE) && ((subid & 0xFF) < 0x07)) {
-		val64 = readq(&bar0->adapter_control);
+		u64 val64 = readq(&bar0->adapter_control);
 		if (!(val64 & ADAPTER_CNTL_EN)) {
 			pr_err("Adapter Link down, cannot blink LED\n");
-			return -EFAULT;
+			return -EAGAIN;
 		}
 	}
-	if (sp->id_timer.function == NULL) {
-		init_timer(&sp->id_timer);
-		sp->id_timer.function = s2io_phy_id;
-		sp->id_timer.data = (unsigned long)sp;
-	}
-	mod_timer(&sp->id_timer, jiffies);
-	if (data)
-		msleep_interruptible(data * HZ);
-	else
-		msleep_interruptible(MAX_FLICKER_TIME);
-	del_timer_sync(&sp->id_timer);
 
-	if (CARDS_WITH_FAULTY_LINK_INDICATORS(sp->device_type, subid)) {
-		writeq(last_gpio_ctrl_val, &bar0->gpio_control);
-		last_gpio_ctrl_val = readq(&bar0->gpio_control);
+	switch (state) {
+	case ETHTOOL_ID_ACTIVE:
+		sp->adapt_ctrl_org = readq(&bar0->gpio_control);
+		return -EINVAL;
+
+	case ETHTOOL_ID_ON:
+		s2io_set_led(sp, true);
+		break;
+
+	case ETHTOOL_ID_OFF:
+		s2io_set_led(sp, false);
+		break;
+
+	case ETHTOOL_ID_INACTIVE:
+		if (CARDS_WITH_FAULTY_LINK_INDICATORS(sp->device_type, subid))
+			writeq(sp->adapt_ctrl_org, &bar0->gpio_control);
 	}
 
 	return 0;
@@ -6776,7 +6772,7 @@ static const struct ethtool_ops netdev_e
 	.set_ufo = ethtool_op_set_ufo,
 	.self_test = s2io_ethtool_test,
 	.get_strings = s2io_ethtool_get_strings,
-	.phys_id = s2io_ethtool_idnic,
+	.set_phys_id = s2io_ethtool_set_led,
 	.get_ethtool_stats = s2io_get_ethtool_stats,
 	.get_sset_count = s2io_get_sset_count,
 };
--- a/drivers/net/s2io.h	2011-04-04 18:02:15.878923843 -0700
+++ b/drivers/net/s2io.h	2011-04-04 18:06:40.518243198 -0700
@@ -893,9 +893,6 @@ struct s2io_nic {
 	u16 all_multi_pos;
 	u16 promisc_flg;
 
-	/*  Id timer, used to blink NIC to physically identify NIC. */
-	struct timer_list id_timer;
-
 	/*  Restart timer, used to restart NIC if the device is stuck and
 	 *  a schedule task that will set the correct Link state once the
 	 *  NIC's PHY has stabilized after a state change.

^ permalink raw reply

* Re: [PATCH 4/8] s2io: convert to set_phys_id
From: Stephen Hemminger @ 2011-04-05  1:08 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: David S. Miller, Jon Mason, netdev
In-Reply-To: <1301959036.2935.58.camel@localhost>

		last_gpio_ctrl_val = readq(&bar0->gpio_control);
> [...]
> 
> I think last_gpio_ctrl_val needs to be moved to struct s2io_nic and
> initialised only in the ETHTOOL_ID_ACTIVE case.

Strange there is a value already there and unused?
   sp->adapt_ctrl_org

^ permalink raw reply

* Re: [PATCH net-next-2.6 5/6] ethtool: Change ETHTOOL_PHYS_ID implementation to allow dropping RTNL
From: Stephen Hemminger @ 2011-04-05  1:01 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: David Miller, netdev, linux-net-drivers,
	Michał Mirosław
In-Reply-To: <1301959591.2935.64.camel@localhost>

On Tue, 05 Apr 2011 00:26:31 +0100
Ben Hutchings <bhutchings@solarflare.com> wrote:

> This reimplementation lets us blink LEDs on multiple device at the same
> time, but that's pretty pointless.  The nasty thing is we could try to
> blink LEDs twice over on the same device, violating the rules that the
> drivers depend on.  So I think I need to add:
> 

This looks sane, is there a good way to fix the qlge and cxgb4 drivers
and get rid of the old interface?

-- 

^ permalink raw reply

* Re: [PATCH 0/7] bridge enhancements for net-next
From: David Miller @ 2011-04-05  0:23 UTC (permalink / raw)
  To: shemminger; +Cc: netdev
In-Reply-To: <20110405000326.714524584@vyatta.com>

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Mon, 04 Apr 2011 17:03:26 -0700

> These patches add more netlink support for bridge.
> It is possible to do basic configuration bridge with just netlink.
> Later enhancements will add statistics and parameters.
> 
> The intention is to switch to pure netlink in future and support
> RSTP and deprecate the old ioctl, sysfs and STP code.

Looks good, all applied, thanks Stephen.

^ permalink raw reply

* [Announce] New IPVS GIT trees
From: Simon Horman @ 2011-04-05  0:24 UTC (permalink / raw)
  To: lvs-devel, netdev, netfilter-devel, netfilter
  Cc: Hans Schillstrom, Julian Anastasov, Wensong Zhang,
	Patrick McHardy, David Miller

Hi,

I would like to announce that I have created two new trees
to help me maintain the flow of changes to IPVS.

1) ipvs-2.6

   This the master branch of tree is based of the master
   branch of Patrick's nf-2.6 tree. Patches to rc kernels
   should be made against the master branch. That is, bug fixes.

   I intend to add some other branches for backports and the like.
   Temporary branches, such as for-patrick, will come and
   go as the need arises.

   git://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-2.6.git

2) ipvs-next-2.6

   This the master branch of tree is based of the master
   branch of Patrick's nf-next-2.6 tree. Patches for the next/current
   merge window should be made against the master branch.
   This is where the development of new features should be done.

   My intention is to close this tree to new features during the merge window.

   I may add topic branches as the need arises.
   Temporary branches, such as for-patrick, will come and
   go as the need arises.

   git://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next-2.6.git

My lvs-test-2.6 tree is now deprecated.  Please avoid using it if possible.
I intend to delete it in the not to distant future.

^ permalink raw reply

* [PATCH 7/7] bridge: range check STP parameters
From: Stephen Hemminger @ 2011-04-05  0:03 UTC (permalink / raw)
  To: David S. Miller, Sasikanth V; +Cc: netdev
In-Reply-To: <20110405000326.714524584@vyatta.com>

[-- Attachment #1: br-param.patch --]
[-- Type: text/plain, Size: 13447 bytes --]

Apply restrictions on STP parameters based 802.1D 1998 standard.
   * Fixes missing locking in set path cost ioctl
   * Uses common code for both ioctl and sysfs

This is based on an earlier patch Sasikanth V but with overhaul.

Note:
1. It does NOT enforce the restriction on the relationship max_age and
   forward delay or hello time because in existing implementation these are
   set as independant operations.

2. If STP is disabled, there is no restriction on forward delay

3. No restriction on holding time because users use Linux code to act
   as hub or be sticky.

4. Although standard allow 0-255, Linux only allows 0-63 for port priority
   because more bits are reserved for port number.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>


---
 net/bridge/br_ioctl.c       |   40 ++++++++----------------------------
 net/bridge/br_private.h     |   13 ++++++++---
 net/bridge/br_private_stp.h |   13 +++++++++++
 net/bridge/br_stp.c         |   48 ++++++++++++++++++++++++++++++++++++++++++++
 net/bridge/br_stp_if.c      |   21 +++++++++++++++----
 net/bridge/br_sysfs_br.c    |   39 ++---------------------------------
 net/bridge/br_sysfs_if.c    |   26 +++++++----------------
 7 files changed, 107 insertions(+), 93 deletions(-)

--- a/net/bridge/br_ioctl.c	2011-04-04 15:46:46.822291708 -0700
+++ b/net/bridge/br_ioctl.c	2011-04-04 16:34:29.164630581 -0700
@@ -181,40 +181,19 @@ static int old_dev_ioctl(struct net_devi
 		if (!capable(CAP_NET_ADMIN))
 			return -EPERM;
 
-		spin_lock_bh(&br->lock);
-		br->bridge_forward_delay = clock_t_to_jiffies(args[1]);
-		if (br_is_root_bridge(br))
-			br->forward_delay = br->bridge_forward_delay;
-		spin_unlock_bh(&br->lock);
-		return 0;
+		return br_set_forward_delay(br, args[1]);
 
 	case BRCTL_SET_BRIDGE_HELLO_TIME:
-	{
-		unsigned long t = clock_t_to_jiffies(args[1]);
 		if (!capable(CAP_NET_ADMIN))
 			return -EPERM;
 
-		if (t < HZ)
-			return -EINVAL;
-
-		spin_lock_bh(&br->lock);
-		br->bridge_hello_time = t;
-		if (br_is_root_bridge(br))
-			br->hello_time = br->bridge_hello_time;
-		spin_unlock_bh(&br->lock);
-		return 0;
-	}
+		return br_set_hello_time(br, args[1]);
 
 	case BRCTL_SET_BRIDGE_MAX_AGE:
 		if (!capable(CAP_NET_ADMIN))
 			return -EPERM;
 
-		spin_lock_bh(&br->lock);
-		br->bridge_max_age = clock_t_to_jiffies(args[1]);
-		if (br_is_root_bridge(br))
-			br->max_age = br->bridge_max_age;
-		spin_unlock_bh(&br->lock);
-		return 0;
+		return br_set_max_age(br, args[1]);
 
 	case BRCTL_SET_AGEING_TIME:
 		if (!capable(CAP_NET_ADMIN))
@@ -275,19 +254,16 @@ static int old_dev_ioctl(struct net_devi
 	case BRCTL_SET_PORT_PRIORITY:
 	{
 		struct net_bridge_port *p;
-		int ret = 0;
+		int ret;
 
 		if (!capable(CAP_NET_ADMIN))
 			return -EPERM;
 
-		if (args[2] >= (1<<(16-BR_PORT_BITS)))
-			return -ERANGE;
-
 		spin_lock_bh(&br->lock);
 		if ((p = br_get_port(br, args[1])) == NULL)
 			ret = -EINVAL;
 		else
-			br_stp_set_port_priority(p, args[2]);
+			ret = br_stp_set_port_priority(p, args[2]);
 		spin_unlock_bh(&br->lock);
 		return ret;
 	}
@@ -295,15 +271,17 @@ static int old_dev_ioctl(struct net_devi
 	case BRCTL_SET_PATH_COST:
 	{
 		struct net_bridge_port *p;
-		int ret = 0;
+		int ret;
 
 		if (!capable(CAP_NET_ADMIN))
 			return -EPERM;
 
+		spin_lock_bh(&br->lock);
 		if ((p = br_get_port(br, args[1])) == NULL)
 			ret = -EINVAL;
 		else
-			br_stp_set_path_cost(p, args[2]);
+			ret = br_stp_set_path_cost(p, args[2]);
+		spin_unlock_bh(&br->lock);
 
 		return ret;
 	}
--- a/net/bridge/br_private.h	2011-04-04 15:39:32.777614097 -0700
+++ b/net/bridge/br_private.h	2011-04-04 16:39:55.816195002 -0700
@@ -495,6 +495,11 @@ extern struct net_bridge_port *br_get_po
 extern void br_init_port(struct net_bridge_port *p);
 extern void br_become_designated_port(struct net_bridge_port *p);
 
+extern int br_set_forward_delay(struct net_bridge *br, unsigned long x);
+extern int br_set_hello_time(struct net_bridge *br, unsigned long x);
+extern int br_set_max_age(struct net_bridge *br, unsigned long x);
+
+
 /* br_stp_if.c */
 extern void br_stp_enable_bridge(struct net_bridge *br);
 extern void br_stp_disable_bridge(struct net_bridge *br);
@@ -505,10 +510,10 @@ extern bool br_stp_recalculate_bridge_id
 extern void br_stp_change_bridge_id(struct net_bridge *br, const unsigned char *a);
 extern void br_stp_set_bridge_priority(struct net_bridge *br,
 				       u16 newprio);
-extern void br_stp_set_port_priority(struct net_bridge_port *p,
-				     u8 newprio);
-extern void br_stp_set_path_cost(struct net_bridge_port *p,
-				 u32 path_cost);
+extern int br_stp_set_port_priority(struct net_bridge_port *p,
+				    unsigned long newprio);
+extern int br_stp_set_path_cost(struct net_bridge_port *p,
+				unsigned long path_cost);
 extern ssize_t br_show_bridge_id(char *buf, const struct bridge_id *id);
 
 /* br_stp_bpdu.c */
--- a/net/bridge/br_private_stp.h	2011-04-04 15:40:02.125930146 -0700
+++ b/net/bridge/br_private_stp.h	2011-04-04 16:21:05.960073263 -0700
@@ -16,6 +16,19 @@
 #define BPDU_TYPE_CONFIG 0
 #define BPDU_TYPE_TCN 0x80
 
+/* IEEE 802.1D-1998 timer values */
+#define BR_MIN_HELLO_TIME	(1*HZ)
+#define BR_MAX_HELLO_TIME	(10*HZ)
+
+#define BR_MIN_FORWARD_DELAY	(2*HZ)
+#define BR_MAX_FORWARD_DELAY	(30*HZ)
+
+#define BR_MIN_MAX_AGE		(6*HZ)
+#define BR_MAX_MAX_AGE		(40*HZ)
+
+#define BR_MIN_PATH_COST	1
+#define BR_MAX_PATH_COST	65535
+
 struct br_config_bpdu
 {
 	unsigned	topology_change:1;
--- a/net/bridge/br_stp.c	2011-04-04 15:47:54.707023927 -0700
+++ b/net/bridge/br_stp.c	2011-04-04 16:33:49.908201916 -0700
@@ -484,3 +484,51 @@ void br_received_tcn_bpdu(struct net_bri
 		br_topology_change_acknowledge(p);
 	}
 }
+
+/* Change bridge STP parameter */
+int br_set_hello_time(struct net_bridge *br, unsigned long val)
+{
+	unsigned long t = clock_t_to_jiffies(val);
+
+	if (t < BR_MIN_HELLO_TIME || t > BR_MAX_HELLO_TIME)
+		return -ERANGE;
+
+	spin_lock_bh(&br->lock);
+	br->bridge_hello_time = t;
+	if (br_is_root_bridge(br))
+		br->hello_time = br->bridge_hello_time;
+	spin_unlock_bh(&br->lock);
+	return 0;
+}
+
+int br_set_max_age(struct net_bridge *br, unsigned long val)
+{
+	unsigned long t = clock_t_to_jiffies(val);
+
+	if (t < BR_MIN_MAX_AGE || t > BR_MAX_MAX_AGE)
+		return -ERANGE;
+
+	spin_lock_bh(&br->lock);
+	br->bridge_max_age = t;
+	if (br_is_root_bridge(br))
+		br->max_age = br->bridge_max_age;
+	spin_unlock_bh(&br->lock);
+	return 0;
+
+}
+
+int br_set_forward_delay(struct net_bridge *br, unsigned long val)
+{
+	unsigned long t = clock_t_to_jiffies(val);
+
+	if (br->stp_enabled != BR_NO_STP &&
+	    (t < BR_MIN_FORWARD_DELAY || t > BR_MAX_FORWARD_DELAY))
+		return -ERANGE;
+
+	spin_lock_bh(&br->lock);
+	br->bridge_forward_delay = t;
+	if (br_is_root_bridge(br))
+		br->forward_delay = br->bridge_forward_delay;
+	spin_unlock_bh(&br->lock);
+	return 0;
+}
--- a/net/bridge/br_stp_if.c	2011-04-04 16:11:56.538282617 -0700
+++ b/net/bridge/br_stp_if.c	2011-04-04 16:33:49.888201699 -0700
@@ -20,7 +20,7 @@
 
 
 /* Port id is composed of priority and port number.
- * NB: least significant bits of priority are dropped to
+ * NB: some bits of priority are dropped to
  *     make room for more ports.
  */
 static inline port_id br_make_port_id(__u8 priority, __u16 port_no)
@@ -29,6 +29,8 @@ static inline port_id br_make_port_id(__
 		| (port_no & ((1<<BR_PORT_BITS)-1));
 }
 
+#define BR_MAX_PORT_PRIORITY ((u16)~0 >> BR_PORT_BITS)
+
 /* called under bridge lock */
 void br_init_port(struct net_bridge_port *p)
 {
@@ -255,10 +257,14 @@ void br_stp_set_bridge_priority(struct n
 }
 
 /* called under bridge lock */
-void br_stp_set_port_priority(struct net_bridge_port *p, u8 newprio)
+int br_stp_set_port_priority(struct net_bridge_port *p, unsigned long newprio)
 {
-	port_id new_port_id = br_make_port_id(newprio, p->port_no);
+	port_id new_port_id;
+
+	if (newprio > BR_MAX_PORT_PRIORITY)
+		return -ERANGE;
 
+	new_port_id = br_make_port_id(newprio, p->port_no);
 	if (br_is_designated_port(p))
 		p->designated_port = new_port_id;
 
@@ -269,14 +275,21 @@ void br_stp_set_port_priority(struct net
 		br_become_designated_port(p);
 		br_port_state_selection(p->br);
 	}
+
+	return 0;
 }
 
 /* called under bridge lock */
-void br_stp_set_path_cost(struct net_bridge_port *p, u32 path_cost)
+int br_stp_set_path_cost(struct net_bridge_port *p, unsigned long path_cost)
 {
+	if (path_cost < BR_MIN_PATH_COST ||
+	    path_cost > BR_MAX_PATH_COST)
+		return -ERANGE;
+
 	p->path_cost = path_cost;
 	br_configuration_update(p->br);
 	br_port_state_selection(p->br);
+	return 0;
 }
 
 ssize_t br_show_bridge_id(char *buf, const struct bridge_id *id)
--- a/net/bridge/br_sysfs_br.c	2011-04-04 15:56:51.112815302 -0700
+++ b/net/bridge/br_sysfs_br.c	2011-04-04 15:59:07.710269828 -0700
@@ -43,9 +43,7 @@ static ssize_t store_bridge_parm(struct
 	if (endp == buf)
 		return -EINVAL;
 
-	spin_lock_bh(&br->lock);
 	err = (*set)(br, val);
-	spin_unlock_bh(&br->lock);
 	return err ? err : len;
 }
 
@@ -57,20 +55,11 @@ static ssize_t show_forward_delay(struct
 	return sprintf(buf, "%lu\n", jiffies_to_clock_t(br->forward_delay));
 }
 
-static int set_forward_delay(struct net_bridge *br, unsigned long val)
-{
-	unsigned long delay = clock_t_to_jiffies(val);
-	br->forward_delay = delay;
-	if (br_is_root_bridge(br))
-		br->bridge_forward_delay = delay;
-	return 0;
-}
-
 static ssize_t store_forward_delay(struct device *d,
 				   struct device_attribute *attr,
 				   const char *buf, size_t len)
 {
-	return store_bridge_parm(d, buf, len, set_forward_delay);
+	return store_bridge_parm(d, buf, len, br_set_forward_delay);
 }
 static DEVICE_ATTR(forward_delay, S_IRUGO | S_IWUSR,
 		   show_forward_delay, store_forward_delay);
@@ -82,24 +71,11 @@ static ssize_t show_hello_time(struct de
 		       jiffies_to_clock_t(to_bridge(d)->hello_time));
 }
 
-static int set_hello_time(struct net_bridge *br, unsigned long val)
-{
-	unsigned long t = clock_t_to_jiffies(val);
-
-	if (t < HZ)
-		return -EINVAL;
-
-	br->hello_time = t;
-	if (br_is_root_bridge(br))
-		br->bridge_hello_time = t;
-	return 0;
-}
-
 static ssize_t store_hello_time(struct device *d,
 				struct device_attribute *attr, const char *buf,
 				size_t len)
 {
-	return store_bridge_parm(d, buf, len, set_hello_time);
+	return store_bridge_parm(d, buf, len, br_set_hello_time);
 }
 static DEVICE_ATTR(hello_time, S_IRUGO | S_IWUSR, show_hello_time,
 		   store_hello_time);
@@ -111,19 +87,10 @@ static ssize_t show_max_age(struct devic
 		       jiffies_to_clock_t(to_bridge(d)->max_age));
 }
 
-static int set_max_age(struct net_bridge *br, unsigned long val)
-{
-	unsigned long t = clock_t_to_jiffies(val);
-	br->max_age = t;
-	if (br_is_root_bridge(br))
-		br->bridge_max_age = t;
-	return 0;
-}
-
 static ssize_t store_max_age(struct device *d, struct device_attribute *attr,
 			     const char *buf, size_t len)
 {
-	return store_bridge_parm(d, buf, len, set_max_age);
+	return store_bridge_parm(d, buf, len, br_set_max_age);
 }
 static DEVICE_ATTR(max_age, S_IRUGO | S_IWUSR, show_max_age, store_max_age);
 
--- a/net/bridge/br_sysfs_if.c	2011-04-04 16:11:07.277767864 -0700
+++ b/net/bridge/br_sysfs_if.c	2011-04-04 16:32:56.123614502 -0700
@@ -23,7 +23,7 @@
 struct brport_attribute {
 	struct attribute	attr;
 	ssize_t (*show)(struct net_bridge_port *, char *);
-	ssize_t (*store)(struct net_bridge_port *, unsigned long);
+	int (*store)(struct net_bridge_port *, unsigned long);
 };
 
 #define BRPORT_ATTR(_name,_mode,_show,_store)		        \
@@ -38,27 +38,17 @@ static ssize_t show_path_cost(struct net
 {
 	return sprintf(buf, "%d\n", p->path_cost);
 }
-static ssize_t store_path_cost(struct net_bridge_port *p, unsigned long v)
-{
-	br_stp_set_path_cost(p, v);
-	return 0;
-}
+
 static BRPORT_ATTR(path_cost, S_IRUGO | S_IWUSR,
-		   show_path_cost, store_path_cost);
+		   show_path_cost, br_stp_set_path_cost);
 
 static ssize_t show_priority(struct net_bridge_port *p, char *buf)
 {
 	return sprintf(buf, "%d\n", p->priority);
 }
-static ssize_t store_priority(struct net_bridge_port *p, unsigned long v)
-{
-	if (v >= (1<<(16-BR_PORT_BITS)))
-		return -ERANGE;
-	br_stp_set_port_priority(p, v);
-	return 0;
-}
+
 static BRPORT_ATTR(priority, S_IRUGO | S_IWUSR,
-			 show_priority, store_priority);
+			 show_priority, br_stp_set_port_priority);
 
 static ssize_t show_designated_root(struct net_bridge_port *p, char *buf)
 {
@@ -136,7 +126,7 @@ static ssize_t show_hold_timer(struct ne
 }
 static BRPORT_ATTR(hold_timer, S_IRUGO, show_hold_timer, NULL);
 
-static ssize_t store_flush(struct net_bridge_port *p, unsigned long v)
+static int store_flush(struct net_bridge_port *p, unsigned long v)
 {
 	br_fdb_delete_by_port(p->br, p, 0); // Don't delete local entry
 	return 0;
@@ -148,7 +138,7 @@ static ssize_t show_hairpin_mode(struct
 	int hairpin_mode = (p->flags & BR_HAIRPIN_MODE) ? 1 : 0;
 	return sprintf(buf, "%d\n", hairpin_mode);
 }
-static ssize_t store_hairpin_mode(struct net_bridge_port *p, unsigned long v)
+static int store_hairpin_mode(struct net_bridge_port *p, unsigned long v)
 {
 	if (v)
 		p->flags |= BR_HAIRPIN_MODE;
@@ -165,7 +155,7 @@ static ssize_t show_multicast_router(str
 	return sprintf(buf, "%d\n", p->multicast_router);
 }
 
-static ssize_t store_multicast_router(struct net_bridge_port *p,
+static int store_multicast_router(struct net_bridge_port *p,
 				      unsigned long v)
 {
 	return br_multicast_set_port_router(p, v);



^ permalink raw reply

* [PATCH 4/7] bridge: add netlink notification on forward entry changes
From: Stephen Hemminger @ 2011-04-05  0:03 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev
In-Reply-To: <20110405000326.714524584@vyatta.com>

[-- Attachment #1: br-fdb-notify.patch --]
[-- Type: text/plain, Size: 5256 bytes --]

This allows applications to query and monitor bridge forwarding
table in the same method used for neighbor table. The forward table
entries are returned in same structure format as used by the ioctl.
If more information is desired in future, the netlink method is
extensible.

Example (using bridge extensions to iproute2)
  # br monitor

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>


---
 net/bridge/br_fdb.c     |  125 ++++++++++++++++++++++++++++++++++++++++++++++++
 net/bridge/br_netlink.c |    1 
 net/bridge/br_private.h |    1 
 3 files changed, 127 insertions(+)

--- a/net/bridge/br_fdb.c	2011-03-21 10:37:11.872443564 -0700
+++ b/net/bridge/br_fdb.c	2011-03-21 13:00:35.370484883 -0700
@@ -28,6 +28,7 @@
 static struct kmem_cache *br_fdb_cache __read_mostly;
 static int fdb_insert(struct net_bridge *br, struct net_bridge_port *source,
 		      const unsigned char *addr);
+static void fdb_notify(const struct net_bridge_fdb_entry *, int);
 
 static u32 fdb_salt __read_mostly;
 
@@ -81,6 +82,7 @@ static void fdb_rcu_free(struct rcu_head
 
 static inline void fdb_delete(struct net_bridge_fdb_entry *f)
 {
+	fdb_notify(f, RTM_DELNEIGH);
 	hlist_del_rcu(&f->hlist);
 	call_rcu(&f->rcu, fdb_rcu_free);
 }
@@ -345,6 +347,7 @@ static struct net_bridge_fdb_entry *fdb_
 		fdb->is_static = 0;
 		fdb->updated = fdb->used = jiffies;
 		hlist_add_head_rcu(&fdb->hlist, head);
+		fdb_notify(fdb, RTM_NEWNEIGH);
 	}
 	return fdb;
 }
@@ -430,3 +433,125 @@ void br_fdb_update(struct net_bridge *br
 		spin_unlock(&br->hash_lock);
 	}
 }
+
+static int fdb_to_nud(const struct net_bridge_fdb_entry *fdb)
+{
+	if (fdb->is_local)
+		return NUD_PERMANENT;
+	else if (fdb->is_static)
+		return NUD_NOARP;
+	else if (has_expired(fdb->dst->br, fdb))
+		return NUD_STALE;
+	else
+		return NUD_REACHABLE;
+}
+
+static int fdb_fill_info(struct sk_buff *skb,
+			 const struct net_bridge_fdb_entry *fdb,
+			 u32 pid, u32 seq, int type, unsigned int flags)
+{
+	unsigned long now = jiffies;
+	struct nda_cacheinfo ci;
+	struct nlmsghdr *nlh;
+	struct ndmsg *ndm;
+
+	nlh = nlmsg_put(skb, pid, seq, type, sizeof(*ndm), flags);
+	if (nlh == NULL)
+		return -EMSGSIZE;
+
+
+	ndm = nlmsg_data(nlh);
+	ndm->ndm_family	 = AF_BRIDGE;
+	ndm->ndm_pad1    = 0;
+	ndm->ndm_pad2    = 0;
+	ndm->ndm_flags	 = 0;
+	ndm->ndm_type	 = 0;
+	ndm->ndm_ifindex = fdb->dst->dev->ifindex;
+	ndm->ndm_state   = fdb_to_nud(fdb);
+
+	NLA_PUT(skb, NDA_LLADDR, ETH_ALEN, &fdb->addr);
+
+	ci.ndm_used	 = jiffies_to_clock_t(now - fdb->used);
+	ci.ndm_confirmed = 0;
+	ci.ndm_updated	 = jiffies_to_clock_t(now - fdb->updated);
+	ci.ndm_refcnt	 = 0;
+	NLA_PUT(skb, NDA_CACHEINFO, sizeof(ci), &ci);
+
+	return nlmsg_end(skb, nlh);
+
+nla_put_failure:
+	nlmsg_cancel(skb, nlh);
+	return -EMSGSIZE;
+}
+
+static inline size_t fdb_nlmsg_size(void)
+{
+	return NLMSG_ALIGN(sizeof(struct ndmsg))
+		+ nla_total_size(ETH_ALEN) /* NDA_LLADDR */
+		+ nla_total_size(sizeof(struct nda_cacheinfo));
+}
+
+static void fdb_notify(const struct net_bridge_fdb_entry *fdb, int type)
+{
+	struct net *net = dev_net(fdb->dst->dev);
+	struct sk_buff *skb;
+	int err = -ENOBUFS;
+
+	skb = nlmsg_new(fdb_nlmsg_size(), GFP_ATOMIC);
+	if (skb == NULL)
+		goto errout;
+
+	err = fdb_fill_info(skb, fdb, 0, 0, type, 0);
+	if (err < 0) {
+		/* -EMSGSIZE implies BUG in fdb_nlmsg_size() */
+		WARN_ON(err == -EMSGSIZE);
+		kfree_skb(skb);
+		goto errout;
+	}
+	rtnl_notify(skb, net, 0, RTNLGRP_NEIGH, NULL, GFP_ATOMIC);
+	return;
+errout:
+	if (err < 0)
+		rtnl_set_sk_err(net, RTNLGRP_NEIGH, err);
+}
+
+/* Dump information about entries, in response to GETNEIGH */
+int br_fdb_dump(struct sk_buff *skb, struct netlink_callback *cb)
+{
+	struct net *net = sock_net(skb->sk);
+	struct net_device *dev;
+	int idx = 0;
+
+	rcu_read_lock();
+	for_each_netdev_rcu(net, dev) {
+		struct net_bridge *br = netdev_priv(dev);
+		int i;
+
+		if (!(dev->priv_flags & IFF_EBRIDGE))
+			continue;
+
+		for (i = 0; i < BR_HASH_SIZE; i++) {
+			struct hlist_node *h;
+			struct net_bridge_fdb_entry *f;
+
+			hlist_for_each_entry_rcu(f, h, &br->hash[i], hlist) {
+				if (idx < cb->args[0])
+					goto skip;
+
+				if (fdb_fill_info(skb, f,
+						  NETLINK_CB(cb->skb).pid,
+						  cb->nlh->nlmsg_seq,
+						  RTM_NEWNEIGH,
+						  NLM_F_MULTI) < 0)
+					break;
+skip:
+				++idx;
+			}
+		}
+	}
+	rcu_read_unlock();
+
+	cb->args[0] = idx;
+
+	return skb->len;
+}
--- a/net/bridge/br_private.h	2011-03-21 10:36:16.608199014 -0700
+++ b/net/bridge/br_private.h	2011-03-21 13:00:19.106207704 -0700
@@ -354,6 +354,7 @@ extern int br_fdb_insert(struct net_brid
 extern void br_fdb_update(struct net_bridge *br,
 			  struct net_bridge_port *source,
 			  const unsigned char *addr);
+extern int br_fdb_dump(struct sk_buff *skb, struct netlink_callback *cb);
 
 /* br_forward.c */
 extern void br_deliver(const struct net_bridge_port *to,
--- a/net/bridge/br_netlink.c	2011-03-21 10:33:38.838610211 -0700
+++ b/net/bridge/br_netlink.c	2011-03-21 13:00:19.090207432 -0700
@@ -196,6 +196,7 @@ int __init br_netlink_init(void)
 
 	/* Only the first call to __rtnl_register can fail */
 	__rtnl_register(PF_BRIDGE, RTM_SETLINK, br_rtm_setlink, NULL);
+	__rtnl_register(PF_BRIDGE, RTM_GETNEIGH, NULL, br_fdb_dump);
 
 	return 0;
 }



^ permalink raw reply

* [PATCH 6/7] bridge: allow creating bridge devices with netlink
From: Stephen Hemminger @ 2011-04-05  0:03 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev
In-Reply-To: <20110405000326.714524584@vyatta.com>

[-- Attachment #1: br-newlink.patch --]
[-- Type: text/plain, Size: 8289 bytes --]

Add netlink device ops to allow creating bridge device via netlink.
This works in a manner similar to vlan, macvlan and bonding.

Example:
  # ip link add link dev br0 type bridge
  # ip link del dev br0

The change required rearranging initializtion code to deal with
being called by create link. Most of the initialization happens
in br_dev_setup, but allocation of stats is done in ndo_init callback
to deal with allocation failure. Sysfs setup has to wait until
after the network device kobject is registered.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

---
 net/bridge/br.c         |    1 
 net/bridge/br_device.c  |   41 +++++++++++++++++++++++
 net/bridge/br_if.c      |   83 ++----------------------------------------------
 net/bridge/br_netlink.c |   55 +++++++++++++++++++++++++++----
 net/bridge/br_notify.c  |    6 +++
 5 files changed, 100 insertions(+), 86 deletions(-)

--- a/net/bridge/br_device.c	2011-03-22 10:14:12.669553975 -0700
+++ b/net/bridge/br_device.c	2011-03-22 10:25:05.369243058 -0700
@@ -74,6 +74,17 @@ out:
 	return NETDEV_TX_OK;
 }
 
+static int br_dev_init(struct net_device *dev)
+{
+	struct net_bridge *br = netdev_priv(dev);
+
+	br->stats = alloc_percpu(struct br_cpu_netstats);
+	if (!br->stats)
+		return -ENOMEM;
+
+	return 0;
+}
+
 static int br_dev_open(struct net_device *dev)
 {
 	struct net_bridge *br = netdev_priv(dev);
@@ -334,6 +345,7 @@ static const struct ethtool_ops br_ethto
 static const struct net_device_ops br_netdev_ops = {
 	.ndo_open		 = br_dev_open,
 	.ndo_stop		 = br_dev_stop,
+	.ndo_init		 = br_dev_init,
 	.ndo_start_xmit		 = br_dev_xmit,
 	.ndo_get_stats64	 = br_get_stats64,
 	.ndo_set_mac_address	 = br_set_mac_address,
@@ -357,18 +369,47 @@ static void br_dev_free(struct net_devic
 	free_netdev(dev);
 }
 
+static struct device_type br_type = {
+	.name	= "bridge",
+};
+
 void br_dev_setup(struct net_device *dev)
 {
+	struct net_bridge *br = netdev_priv(dev);
+
 	random_ether_addr(dev->dev_addr);
 	ether_setup(dev);
 
 	dev->netdev_ops = &br_netdev_ops;
 	dev->destructor = br_dev_free;
 	SET_ETHTOOL_OPS(dev, &br_ethtool_ops);
+	SET_NETDEV_DEVTYPE(dev, &br_type);
 	dev->tx_queue_len = 0;
 	dev->priv_flags = IFF_EBRIDGE;
 
 	dev->features = NETIF_F_SG | NETIF_F_FRAGLIST | NETIF_F_HIGHDMA |
 			NETIF_F_GSO_MASK | NETIF_F_NO_CSUM | NETIF_F_LLTX |
 			NETIF_F_NETNS_LOCAL | NETIF_F_GSO | NETIF_F_HW_VLAN_TX;
+
+	br->dev = dev;
+	spin_lock_init(&br->lock);
+	INIT_LIST_HEAD(&br->port_list);
+	spin_lock_init(&br->hash_lock);
+
+	br->bridge_id.prio[0] = 0x80;
+	br->bridge_id.prio[1] = 0x00;
+
+	memcpy(br->group_addr, br_group_address, ETH_ALEN);
+
+	br->feature_mask = dev->features;
+	br->stp_enabled = BR_NO_STP;
+	br->designated_root = br->bridge_id;
+	br->bridge_max_age = br->max_age = 20 * HZ;
+	br->bridge_hello_time = br->hello_time = 2 * HZ;
+	br->bridge_forward_delay = br->forward_delay = 15 * HZ;
+	br->ageing_time = 300 * HZ;
+
+	br_netfilter_rtable_init(br);
+	br_stp_timer_init(br);
+	br_multicast_init(br);
 }
--- a/net/bridge/br_netlink.c	2011-03-22 10:25:01.057042585 -0700
+++ b/net/bridge/br_netlink.c	2011-03-22 10:25:05.369243058 -0700
@@ -12,9 +12,11 @@
 
 #include <linux/kernel.h>
 #include <linux/slab.h>
+#include <linux/etherdevice.h>
 #include <net/rtnetlink.h>
 #include <net/net_namespace.h>
 #include <net/sock.h>
+
 #include "br_private.h"
 
 static inline size_t br_nlmsg_size(void)
@@ -188,24 +190,61 @@ static int br_rtm_setlink(struct sk_buff
 	return 0;
 }
 
+static int br_validate(struct nlattr *tb[], struct nlattr *data[])
+{
+	if (tb[IFLA_ADDRESS]) {
+		if (nla_len(tb[IFLA_ADDRESS]) != ETH_ALEN)
+			return -EINVAL;
+		if (!is_valid_ether_addr(nla_data(tb[IFLA_ADDRESS])))
+			return -EADDRNOTAVAIL;
+	}
+
+	return 0;
+}
+
+static struct rtnl_link_ops br_link_ops __read_mostly = {
+	.kind		= "bridge",
+	.priv_size	= sizeof(struct net_bridge),
+	.setup		= br_dev_setup,
+	.validate	= br_validate,
+};
 
 int __init br_netlink_init(void)
 {
-	if (__rtnl_register(PF_BRIDGE, RTM_GETLINK, NULL, br_dump_ifinfo))
-		return -ENOBUFS;
+	int err;
 
-	/* Only the first call to __rtnl_register can fail */
-	__rtnl_register(PF_BRIDGE, RTM_SETLINK, br_rtm_setlink, NULL);
+	err = rtnl_link_register(&br_link_ops);
+	if (err < 0)
+		goto err1;
 
-	__rtnl_register(PF_BRIDGE, RTM_NEWNEIGH, br_fdb_add, NULL);
-	__rtnl_register(PF_BRIDGE, RTM_DELNEIGH, br_fdb_delete, NULL);
-	__rtnl_register(PF_BRIDGE, RTM_GETNEIGH, NULL, br_fdb_dump);
+	err = __rtnl_register(PF_BRIDGE, RTM_GETLINK, NULL, br_dump_ifinfo);
+	if (err)
+		goto err2;
+	err = __rtnl_register(PF_BRIDGE, RTM_SETLINK, br_rtm_setlink, NULL);
+	if (err)
+		goto err3;
+	err = __rtnl_register(PF_BRIDGE, RTM_NEWNEIGH, br_fdb_add, NULL);
+	if (err)
+		goto err3;
+	err = __rtnl_register(PF_BRIDGE, RTM_DELNEIGH, br_fdb_delete, NULL);
+	if (err)
+		goto err3;
+	err = __rtnl_register(PF_BRIDGE, RTM_GETNEIGH, NULL, br_fdb_dump);
+	if (err)
+		goto err3;
 
 	return 0;
+
+err3:
+	rtnl_unregister_all(PF_BRIDGE);
+err2:
+	rtnl_link_unregister(&br_link_ops);
+err1:
+	return err;
 }
 
 void __exit br_netlink_fini(void)
 {
+	rtnl_link_unregister(&br_link_ops);
 	rtnl_unregister_all(PF_BRIDGE);
 }
-
--- a/net/bridge/br_if.c	2011-03-22 10:24:50.420524900 -0700
+++ b/net/bridge/br_if.c	2011-03-22 10:25:05.369243058 -0700
@@ -175,56 +175,6 @@ static void del_br(struct net_bridge *br
 	unregister_netdevice_queue(br->dev, head);
 }
 
-static struct net_device *new_bridge_dev(struct net *net, const char *name)
-{
-	struct net_bridge *br;
-	struct net_device *dev;
-
-	dev = alloc_netdev(sizeof(struct net_bridge), name,
-			   br_dev_setup);
-
-	if (!dev)
-		return NULL;
-	dev_net_set(dev, net);
-
-	br = netdev_priv(dev);
-	br->dev = dev;
-
-	br->stats = alloc_percpu(struct br_cpu_netstats);
-	if (!br->stats) {
-		free_netdev(dev);
-		return NULL;
-	}
-
-	spin_lock_init(&br->lock);
-	INIT_LIST_HEAD(&br->port_list);
-	spin_lock_init(&br->hash_lock);
-
-	br->bridge_id.prio[0] = 0x80;
-	br->bridge_id.prio[1] = 0x00;
-
-	memcpy(br->group_addr, br_group_address, ETH_ALEN);
-
-	br->feature_mask = dev->features;
-	br->stp_enabled = BR_NO_STP;
-	br->designated_root = br->bridge_id;
-	br->root_path_cost = 0;
-	br->root_port = 0;
-	br->bridge_max_age = br->max_age = 20 * HZ;
-	br->bridge_hello_time = br->hello_time = 2 * HZ;
-	br->bridge_forward_delay = br->forward_delay = 15 * HZ;
-	br->topology_change = 0;
-	br->topology_change_detected = 0;
-	br->ageing_time = 300 * HZ;
-
-	br_netfilter_rtable_init(br);
-
-	br_stp_timer_init(br);
-	br_multicast_init(br);
-
-	return dev;
-}
-
 /* find an available port number */
 static int find_portno(struct net_bridge *br)
 {
@@ -277,42 +227,19 @@ static struct net_bridge_port *new_nbp(s
 	return p;
 }
 
-static struct device_type br_type = {
-	.name	= "bridge",
-};
-
 int br_add_bridge(struct net *net, const char *name)
 {
 	struct net_device *dev;
-	int ret;
 
-	dev = new_bridge_dev(net, name);
+	dev = alloc_netdev(sizeof(struct net_bridge), name,
+			   br_dev_setup);
+
 	if (!dev)
 		return -ENOMEM;
 
-	rtnl_lock();
-	if (strchr(dev->name, '%')) {
-		ret = dev_alloc_name(dev, dev->name);
-		if (ret < 0)
-			goto out_free;
-	}
-
-	SET_NETDEV_DEVTYPE(dev, &br_type);
+	dev_net_set(dev, net);
 
-	ret = register_netdevice(dev);
-	if (ret)
-		goto out_free;
-
-	ret = br_sysfs_addbr(dev);
-	if (ret)
-		unregister_netdevice(dev);
- out:
-	rtnl_unlock();
-	return ret;
-
-out_free:
-	free_netdev(dev);
-	goto out;
+	return register_netdev(dev);
 }
 
 int br_del_bridge(struct net *net, const char *name)
--- a/net/bridge/br.c	2011-03-22 10:13:27.074313779 -0700
+++ b/net/bridge/br.c	2011-03-22 10:25:05.369243058 -0700
@@ -104,3 +104,4 @@ module_init(br_init)
 module_exit(br_deinit)
 MODULE_LICENSE("GPL");
 MODULE_VERSION(BR_VERSION);
+MODULE_ALIAS_RTNL_LINK("bridge");
--- a/net/bridge/br_notify.c	2011-03-22 10:13:27.090305095 -0700
+++ b/net/bridge/br_notify.c	2011-03-22 10:25:05.369243058 -0700
@@ -36,6 +36,12 @@ static int br_device_event(struct notifi
 	struct net_bridge *br;
 	int err;
 
+	/* register of bridge completed, add sysfs entries */
+	if ((dev->priv_flags && IFF_EBRIDGE) && event == NETDEV_REGISTER) {
+		br_sysfs_addbr(dev);
+		return NOTIFY_DONE;
+	}
+
 	/* not a port of a bridge */
 	p = br_port_get_rtnl(dev);
 	if (!p)



^ permalink raw reply

* [PATCH 5/7] bridge: allow creating/deleting fdb entries via netlink
From: Stephen Hemminger @ 2011-04-05  0:03 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev
In-Reply-To: <20110405000326.714524584@vyatta.com>

[-- Attachment #1: br-fdb-newneigh.patch --]
[-- Type: text/plain, Size: 5105 bytes --]

Use RTM_NEWNEIGH and RTM_DELNEIGH to allow updating of entries
in bridge forwarding table. This allows manipulating static entries
which is not possible with existing tools.

Example (using bridge extensions to iproute2)
   # br fdb add 00:02:03:04:05:06 dev eth0

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

---
 net/bridge/br_fdb.c     |  139 ++++++++++++++++++++++++++++++++++++++++++++++++
 net/bridge/br_netlink.c |    3 +
 net/bridge/br_private.h |    2 
 3 files changed, 144 insertions(+)

--- a/net/bridge/br_fdb.c	2011-03-22 10:25:00.329008182 -0700
+++ b/net/bridge/br_fdb.c	2011-03-22 10:25:01.057042585 -0700
@@ -555,3 +555,142 @@ skip:
 
 	return skb->len;
 }
+
+/* Create new static fdb entry */
+static int fdb_add_entry(struct net_bridge_port *source, const __u8 *addr,
+			 __u16 state)
+{
+	struct net_bridge *br = source->br;
+	struct hlist_head *head = &br->hash[br_mac_hash(addr)];
+	struct net_bridge_fdb_entry *fdb;
+
+	fdb = fdb_find(head, addr);
+	if (fdb)
+		return -EEXIST;
+
+	fdb = fdb_create(head, source, addr);
+	if (!fdb)
+		return -ENOMEM;
+
+	if (state & NUD_PERMANENT)
+		fdb->is_local = fdb->is_static = 1;
+	else if (state & NUD_NOARP)
+		fdb->is_static = 1;
+	return 0;
+}
+
+/* Add new permanent fdb entry with RTM_NEWNEIGH */
+int br_fdb_add(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
+{
+	struct net *net = sock_net(skb->sk);
+	struct ndmsg *ndm;
+	struct nlattr *tb[NDA_MAX+1];
+	struct net_device *dev;
+	struct net_bridge_port *p;
+	const __u8 *addr;
+	int err;
+
+	ASSERT_RTNL();
+	err = nlmsg_parse(nlh, sizeof(*ndm), tb, NDA_MAX, NULL);
+	if (err < 0)
+		return err;
+
+	ndm = nlmsg_data(nlh);
+	if (ndm->ndm_ifindex == 0) {
+		pr_info("bridge: RTM_NEWNEIGH with invalid ifindex\n");
+		return -EINVAL;
+	}
+
+	dev = __dev_get_by_index(net, ndm->ndm_ifindex);
+	if (dev == NULL) {
+		pr_info("bridge: RTM_NEWNEIGH with unknown ifindex\n");
+		return -ENODEV;
+	}
+
+	if (!tb[NDA_LLADDR] || nla_len(tb[NDA_LLADDR]) != ETH_ALEN) {
+		pr_info("bridge: RTM_NEWNEIGH with invalid address\n");
+		return -EINVAL;
+	}
+
+	addr = nla_data(tb[NDA_LLADDR]);
+	if (!is_valid_ether_addr(addr)) {
+		pr_info("bridge: RTM_NEWNEIGH with invalid ether address\n");
+		return -EINVAL;
+	}
+
+	p = br_port_get_rtnl(dev);
+	if (p == NULL) {
+		pr_info("bridge: RTM_NEWNEIGH %s not a bridge port\n",
+			dev->name);
+		return -EINVAL;
+	}
+
+	spin_lock_bh(&p->br->hash_lock);
+	err = fdb_add_entry(p, addr, ndm->ndm_state);
+	spin_unlock_bh(&p->br->hash_lock);
+
+	return err;
+}
+
+static int fdb_delete_by_addr(struct net_bridge_port *p, const u8 *addr)
+{
+	struct net_bridge *br = p->br;
+	struct hlist_head *head = &br->hash[br_mac_hash(addr)];
+	struct net_bridge_fdb_entry *fdb;
+
+	fdb = fdb_find(head, addr);
+	if (!fdb)
+		return -ENOENT;
+
+	fdb_delete(fdb);
+	return 0;
+}
+
+/* Remove neighbor entry with RTM_DELNEIGH */
+int br_fdb_delete(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
+{
+	struct net *net = sock_net(skb->sk);
+	struct ndmsg *ndm;
+	struct net_bridge_port *p;
+	struct nlattr *llattr;
+	const __u8 *addr;
+	struct net_device *dev;
+	int err;
+
+	ASSERT_RTNL();
+	if (nlmsg_len(nlh) < sizeof(*ndm))
+		return -EINVAL;
+
+	ndm = nlmsg_data(nlh);
+	if (ndm->ndm_ifindex == 0) {
+		pr_info("bridge: RTM_DELNEIGH with invalid ifindex\n");
+		return -EINVAL;
+	}
+
+	dev = __dev_get_by_index(net, ndm->ndm_ifindex);
+	if (dev == NULL) {
+		pr_info("bridge: RTM_DELNEIGH with unknown ifindex\n");
+		return -ENODEV;
+	}
+
+	llattr = nlmsg_find_attr(nlh, sizeof(*ndm), NDA_LLADDR);
+	if (llattr == NULL || nla_len(llattr) != ETH_ALEN) {
+		pr_info("bridge: RTM_DELNEIGH with invalid address\n");
+		return -EINVAL;
+	}
+
+	addr = nla_data(llattr);
+
+	p = br_port_get_rtnl(dev);
+	if (p == NULL) {
+		pr_info("bridge: RTM_DELNEIGH %s not a bridge port\n",
+			dev->name);
+		return -EINVAL;
+	}
+
+	spin_lock_bh(&p->br->hash_lock);
+	err = fdb_delete_by_addr(p, addr);
+	spin_unlock_bh(&p->br->hash_lock);
+
+	return err;
+}
--- a/net/bridge/br_netlink.c	2011-03-22 10:25:00.329008182 -0700
+++ b/net/bridge/br_netlink.c	2011-03-22 10:25:01.057042585 -0700
@@ -196,6 +196,9 @@ int __init br_netlink_init(void)
 
 	/* Only the first call to __rtnl_register can fail */
 	__rtnl_register(PF_BRIDGE, RTM_SETLINK, br_rtm_setlink, NULL);
+
+	__rtnl_register(PF_BRIDGE, RTM_NEWNEIGH, br_fdb_add, NULL);
+	__rtnl_register(PF_BRIDGE, RTM_DELNEIGH, br_fdb_delete, NULL);
 	__rtnl_register(PF_BRIDGE, RTM_GETNEIGH, NULL, br_fdb_dump);
 
 	return 0;
--- a/net/bridge/br_private.h	2011-03-22 10:25:00.329008182 -0700
+++ b/net/bridge/br_private.h	2011-03-22 10:25:01.057042585 -0700
@@ -355,6 +355,8 @@ extern void br_fdb_update(struct net_bri
 			  struct net_bridge_port *source,
 			  const unsigned char *addr);
 extern int br_fdb_dump(struct sk_buff *skb, struct netlink_callback *cb);
+extern int br_fdb_add(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg);
+extern int br_fdb_delete(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg);
 
 /* br_forward.c */
 extern void br_deliver(const struct net_bridge_port *to,



^ permalink raw reply

* [PATCH 1/7] bridge: change arguments to fdb_create
From: Stephen Hemminger @ 2011-04-05  0:03 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev
In-Reply-To: <20110405000326.714524584@vyatta.com>

[-- Attachment #1: br-fdb-reorg.patch --]
[-- Type: text/plain, Size: 1772 bytes --]

Later patch provides ability to create non-local static entry.
To make this easier move the updating of the flag values to
after the code that creates entry.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

--- a/net/bridge/br_fdb.c	2011-03-21 09:04:33.000000000 -0700
+++ b/net/bridge/br_fdb.c	2011-03-21 10:36:09.359041121 -0700
@@ -320,8 +320,7 @@ static inline struct net_bridge_fdb_entr
 
 static struct net_bridge_fdb_entry *fdb_create(struct hlist_head *head,
 					       struct net_bridge_port *source,
-					       const unsigned char *addr,
-					       int is_local)
+					       const unsigned char *addr)
 {
 	struct net_bridge_fdb_entry *fdb;
 
@@ -329,10 +328,9 @@ static struct net_bridge_fdb_entry *fdb_
 	if (fdb) {
 		memcpy(fdb->addr.addr, addr, ETH_ALEN);
 		fdb->dst = source;
-		fdb->is_local = is_local;
-		fdb->is_static = is_local;
+		fdb->is_local = 0;
+		fdb->is_static = 0;
 		fdb->ageing_timer = jiffies;
-
 		hlist_add_head_rcu(&fdb->hlist, head);
 	}
 	return fdb;
@@ -360,12 +358,15 @@ static int fdb_insert(struct net_bridge
 		fdb_delete(fdb);
 	}
 
-	if (!fdb_create(head, source, addr, 1))
+	fdb = fdb_create(head, source, addr);
+	if (!fdb)
 		return -ENOMEM;
 
+	fdb->is_local = fdb->is_static = 1;
 	return 0;
 }
 
+/* Add entry for local address of interface */
 int br_fdb_insert(struct net_bridge *br, struct net_bridge_port *source,
 		  const unsigned char *addr)
 {
@@ -407,8 +408,9 @@ void br_fdb_update(struct net_bridge *br
 		}
 	} else {
 		spin_lock(&br->hash_lock);
-		if (!fdb_find(head, addr))
-			fdb_create(head, source, addr, 0);
+		if (likely(!fdb_find(head, addr)))
+			fdb_create(head, source, addr);
+
 		/* else  we lose race and someone else inserts
 		 * it first, don't bother updating
 		 */



^ permalink raw reply

* [PATCH 3/7] bridge: split rcu and no-rcu cases of fdb lookup
From: Stephen Hemminger @ 2011-04-05  0:03 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev
In-Reply-To: <20110405000326.714524584@vyatta.com>

[-- Attachment #1: br-fdb-norcu.patch --]
[-- Type: text/plain, Size: 1349 bytes --]

In some cases, look up of forward database entry is done with RCU;
and for others no RCU is needed because of locking. Split the two
cases into two differnt loops (and take off inline).

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

--- a/net/bridge/br_fdb.c	2011-03-21 10:36:51.109460181 -0700
+++ b/net/bridge/br_fdb.c	2011-03-21 10:37:11.872443564 -0700
@@ -305,8 +305,21 @@ int br_fdb_fillbuf(struct net_bridge *br
 	return num;
 }
 
-static inline struct net_bridge_fdb_entry *fdb_find(struct hlist_head *head,
-						    const unsigned char *addr)
+static struct net_bridge_fdb_entry *fdb_find(struct hlist_head *head,
+					     const unsigned char *addr)
+{
+	struct hlist_node *h;
+	struct net_bridge_fdb_entry *fdb;
+
+	hlist_for_each_entry(fdb, h, head, hlist) {
+		if (!compare_ether_addr(fdb->addr.addr, addr))
+			return fdb;
+	}
+	return NULL;
+}
+
+static struct net_bridge_fdb_entry *fdb_find_rcu(struct hlist_head *head,
+						 const unsigned char *addr)
 {
 	struct hlist_node *h;
 	struct net_bridge_fdb_entry *fdb;
@@ -393,7 +406,7 @@ void br_fdb_update(struct net_bridge *br
 	      source->state == BR_STATE_FORWARDING))
 		return;
 
-	fdb = fdb_find(head, addr);
+	fdb = fdb_find_rcu(head, addr);
 	if (likely(fdb)) {
 		/* attempt to update an entry for a local interface */
 		if (unlikely(fdb->is_local)) {



^ permalink raw reply

* [PATCH 0/7] bridge enhancements for net-next
From: Stephen Hemminger @ 2011-04-05  0:03 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev

These patches add more netlink support for bridge.
It is possible to do basic configuration bridge with just netlink.
Later enhancements will add statistics and parameters.

The intention is to switch to pure netlink in future and support
RSTP and deprecate the old ioctl, sysfs and STP code.


^ permalink raw reply

* [PATCH 2/7] bridge: track last used time in forwarding table
From: Stephen Hemminger @ 2011-04-05  0:03 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev
In-Reply-To: <20110405000326.714524584@vyatta.com>

[-- Attachment #1: br-fdb-used.patch --]
[-- Type: text/plain, Size: 2660 bytes --]

Adds tracking the last used time in forwarding table.
Rename ageing_timer to updated to better describe it.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

---
 net/bridge/br_fdb.c     |   10 +++++-----
 net/bridge/br_input.c   |    5 +++--
 net/bridge/br_private.h |    3 ++-
 3 files changed, 10 insertions(+), 8 deletions(-)

--- a/net/bridge/br_fdb.c	2011-04-03 09:39:13.000000000 -0700
+++ b/net/bridge/br_fdb.c	2011-04-03 09:39:21.221199041 -0700
@@ -62,7 +62,7 @@ static inline int has_expired(const stru
 				  const struct net_bridge_fdb_entry *fdb)
 {
 	return !fdb->is_static &&
-		time_before_eq(fdb->ageing_timer + hold_time(br), jiffies);
+		time_before_eq(fdb->updated + hold_time(br), jiffies);
 }
 
 static inline int br_mac_hash(const unsigned char *mac)
@@ -140,7 +140,7 @@ void br_fdb_cleanup(unsigned long _data)
 			unsigned long this_timer;
 			if (f->is_static)
 				continue;
-			this_timer = f->ageing_timer + delay;
+			this_timer = f->updated + delay;
 			if (time_before_eq(this_timer, jiffies))
 				fdb_delete(f);
 			else if (time_before(this_timer, next_timer))
@@ -293,7 +293,7 @@ int br_fdb_fillbuf(struct net_bridge *br
 
 			fe->is_local = f->is_local;
 			if (!f->is_static)
-				fe->ageing_timer_value = jiffies_to_clock_t(jiffies - f->ageing_timer);
+				fe->ageing_timer_value = jiffies_to_clock_t(jiffies - f->updated);
 			++fe;
 			++num;
 		}
@@ -330,7 +330,7 @@ static struct net_bridge_fdb_entry *fdb_
 		fdb->dst = source;
 		fdb->is_local = 0;
 		fdb->is_static = 0;
-		fdb->ageing_timer = jiffies;
+		fdb->updated = fdb->used = jiffies;
 		hlist_add_head_rcu(&fdb->hlist, head);
 	}
 	return fdb;
@@ -404,7 +404,7 @@ void br_fdb_update(struct net_bridge *br
 		} else {
 			/* fastpath: update of existing entry */
 			fdb->dst = source;
-			fdb->ageing_timer = jiffies;
+			fdb->updated = jiffies;
 		}
 	} else {
 		spin_lock(&br->hash_lock);
--- a/net/bridge/br_input.c	2011-04-01 11:30:16.000000000 -0700
+++ b/net/bridge/br_input.c	2011-04-03 09:39:21.221199041 -0700
@@ -98,9 +98,10 @@ int br_handle_frame_finish(struct sk_buf
 	}
 
 	if (skb) {
-		if (dst)
+		if (dst) {
+			dst->used = jiffies;
 			br_forward(dst->dst, skb, skb2);
-		else
+		} else
 			br_flood_forward(br, skb, skb2);
 	}
 
--- a/net/bridge/br_private.h	2011-04-01 11:30:16.000000000 -0700
+++ b/net/bridge/br_private.h	2011-04-03 09:39:21.221199041 -0700
@@ -64,7 +64,8 @@ struct net_bridge_fdb_entry
 	struct net_bridge_port		*dst;
 
 	struct rcu_head			rcu;
-	unsigned long			ageing_timer;
+	unsigned long			updated;
+	unsigned long			used;
 	mac_addr			addr;
 	unsigned char			is_local;
 	unsigned char			is_static;



^ permalink raw reply

* Re: [GIT PULL nf-2.6] IPVS
From: Simon Horman @ 2011-04-04 23:43 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: lvs-devel, netdev, netfilter-devel, netfilter, Hans Schillstrom,
	Julian Anastasov
In-Reply-To: <4D99C73A.7010101@trash.net>

On Mon, Apr 04, 2011 at 03:27:22PM +0200, Patrick McHardy wrote:
> On 31.03.2011 03:32, Simon Horman wrote:
> > Hi Patrick,
> > 
> > please consider pulling
> > git://git.kernel.org/pub/scm/linux/kernel/git/horms/lvs-test-2.6.git for-patrick
> > to get the following fix from Hans.
> > 
> > I have based this patch on net-2.6/master as nf-2.6/master seems a little
> > out of date (i.e. pre 2.6.39-rc1). Please let me know if you would
> > prefer me to use a different base. Alternatively, feel free to apply
> > the single patch by hand.
> 
> I'll apply this by hand to avoid possible unnecessary merge commits.
> I've also updated nf-2.6.git to net-2.6.git.

Thanks Patrick.


^ permalink raw reply

* Re: [PATCH net-next-2.6 5/6] ethtool: Change ETHTOOL_PHYS_ID implementation to allow dropping RTNL
From: Ben Hutchings @ 2011-04-04 23:26 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, linux-net-drivers, Stephen Hemminger,
	Michał Mirosław
In-Reply-To: <1301860429.2935.29.camel@localhost>

This reimplementation lets us blink LEDs on multiple device at the same
time, but that's pretty pointless.  The nasty thing is we could try to
blink LEDs twice over on the same device, violating the rules that the
drivers depend on.  So I think I need to add:

On Sun, 2011-04-03 at 20:53 +0100, Ben Hutchings wrote:
[...]
> @@ -1618,14 +1620,54 @@ out:
>  static int ethtool_phys_id(struct net_device *dev, void __user *useraddr)
>  {
	static struct net_device *active_dev;

>  	struct ethtool_value id;
> +	int rc;
>  
> -	if (!dev->ethtool_ops->phys_id)
> +	if (!dev->ethtool_ops->set_phys_id && !dev->ethtool_ops->phys_id)
>  		return -EOPNOTSUPP;

	if (active_dev)
		return -EBUSY;

>  	if (copy_from_user(&id, useraddr, sizeof(id)))
>  		return -EFAULT;
>  
> -	return dev->ethtool_ops->phys_id(dev, id.data);
> +	if (!dev->ethtool_ops->set_phys_id)
> +		/* Do it the old way */
> +		return dev->ethtool_ops->phys_id(dev, id.data);
> +
> +	rc = dev->ethtool_ops->set_phys_id(dev, ETHTOOL_ID_ACTIVE);
> +	if (rc && rc != -EINVAL)
> +		return rc;
> +

	active_dev = dev;

> +	dev_hold(dev);
> +	rtnl_unlock();
> +
> +	if (rc == 0) {
> +		/* Driver will handle this itself */
> +		schedule_timeout_interruptible(
> +			id.data ? id.data : MAX_SCHEDULE_TIMEOUT);
> +	} else {
> +		/* Driver expects to be called periodically */
> +		do {
> +			rtnl_lock();
> +			rc = dev->ethtool_ops->set_phys_id(dev, ETHTOOL_ID_ON);
> +			rtnl_unlock();
> +			if (rc)
> +				break;
> +			schedule_timeout_interruptible(HZ / 2);
> +
> +			rtnl_lock();
> +			rc = dev->ethtool_ops->set_phys_id(dev, ETHTOOL_ID_OFF);
> +			rtnl_unlock();
> +			if (rc)
> +				break;
> +			schedule_timeout_interruptible(HZ / 2);
> +		} while (!signal_pending(current) &&
> +			 (id.data == 0 || --id.data != 0));
> +	}
> +
> +	rtnl_lock();
> +	dev_put(dev);

	active_dev = NULL;

> +	(void)dev->ethtool_ops->set_phys_id(dev, ETHTOOL_ID_INACTIVE);
> +	return rc;
>  }
>  
>  static int ethtool_get_stats(struct net_device *dev, void __user *useraddr)

-- 
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: [PATCH 8/8] ewrk3: convert to set_phys_id
From: Ben Hutchings @ 2011-04-04 23:21 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David S. Miller, netdev
In-Reply-To: <20110404210805.998401718@linuxplumber.net>

On Mon, 2011-04-04 at 14:06 -0700, Stephen Hemminger wrote:
> Keep orginal locking and error handling.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> 
> 
> --- a/drivers/net/ewrk3.c	2011-04-04 13:41:41.717791151 -0700
> +++ b/drivers/net/ewrk3.c	2011-04-04 13:46:55.425138028 -0700
> @@ -1604,55 +1604,51 @@ static u32 ewrk3_get_link(struct net_dev
>  	return !(cmr & CMR_LINK);
>  }
>  
> -static int ewrk3_phys_id(struct net_device *dev, u32 data)
> +static int ewrk3_set_phys_id(struct net_device *dev,
> +			     enum ethtool_phys_id_state state)
>  {
>  	struct ewrk3_private *lp = netdev_priv(dev);
>  	unsigned long iobase = dev->base_addr;
>  	unsigned long flags;
>  	u8 cr;
> -	int count;
> -
> -	/* Toggle LED 4x per second */
> -	count = data << 2;
>  
>  	spin_lock_irqsave(&lp->hw_lock, flags);
>  
> -	/* Bail if a PHYS_ID is already in progress */
> -	if (lp->led_mask == 0) {
> -		spin_unlock_irqrestore(&lp->hw_lock, flags);
> -		return -EBUSY;
> -	}
> +	switch (state) {
> +	case ETHTOOL_ID_ACTIVE:
> +		/* Bail if a PHYS_ID is already in progress */
> +		if (lp->led_mask == 0) {
> +			spin_unlock_irqrestore(&lp->hw_lock, flags);
> +			return -EBUSY;
> +		}
[...]

This can never happen.  Well, actually it can with the first version of
my patch, but I'll fix that. :-)

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: [PATCH 4/8] s2io: convert to set_phys_id
From: Ben Hutchings @ 2011-04-04 23:17 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David S. Miller, Jon Mason, netdev
In-Reply-To: <20110404210805.593573200@linuxplumber.net>

On Mon, 2011-04-04 at 14:06 -0700, Stephen Hemminger wrote:
> plain text document attachment (s2io-set-phys.patch)
> Convert to new ethtool set physical id model. Remove no longer used
> timer, and fix docbook comment.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> 
> 
> --- a/drivers/net/s2io.c	2011-04-04 12:56:26.680911740 -0700
> +++ b/drivers/net/s2io.c	2011-04-04 13:12:34.591276022 -0700
> @@ -5484,53 +5484,50 @@ static void s2io_ethtool_gregs(struct ne
>  	}
>  }
>  
> -/**
> - *  s2io_phy_id  - timer function that alternates adapter LED.
> - *  @data : address of the private member of the device structure, which
> - *  is a pointer to the s2io_nic structure, provided as an u32.
> - * Description: This is actually the timer function that alternates the
> - * adapter LED bit of the adapter control bit to set/reset every time on
> - * invocation. The timer is set for 1/2 a second, hence tha NIC blinks
> - *  once every second.
> +/*
> + *  s2io_set_led - control NIC led
>   */
> -static void s2io_phy_id(unsigned long data)
> +static void s2io_set_led(struct s2io_nic *sp, bool on)
>  {
> -	struct s2io_nic *sp = (struct s2io_nic *)data;
>  	struct XENA_dev_config __iomem *bar0 = sp->bar0;
> -	u64 val64 = 0;
> -	u16 subid;
> +	u16 subid = sp->pdev->subsystem_device;
> +	u64 val64;
>  
> -	subid = sp->pdev->subsystem_device;
>  	if ((sp->device_type == XFRAME_II_DEVICE) ||
>  	    ((subid & 0xFF) >= 0x07)) {
>  		val64 = readq(&bar0->gpio_control);
> -		val64 ^= GPIO_CTRL_GPIO_0;
> +		if (on)
> +			val64 |= GPIO_CTRL_GPIO_0;
> +		else
> +			val64 &= ~GPIO_CTRL_GPIO_0;
> +
>  		writeq(val64, &bar0->gpio_control);
>  	} else {
>  		val64 = readq(&bar0->adapter_control);
> -		val64 ^= ADAPTER_LED_ON;
> +		if (on)
> +			val64 |= ADAPTER_LED_ON;
> +		else
> +			val64 &= ~ADAPTER_LED_ON;
> +
>  		writeq(val64, &bar0->adapter_control);
>  	}
>  
> -	mod_timer(&sp->id_timer, jiffies + HZ / 2);
>  }
>  
>  /**
> - * s2io_ethtool_idnic - To physically identify the nic on the system.
> - * @sp : private member of the device structure, which is a pointer to the
> - * s2io_nic structure.
> - * @id : pointer to the structure with identification parameters given by
> - * ethtool.
> + * s2io_ethtool_set_led - To physically identify the nic on the system.
> + * @dev : network device
> + * @state: led setting
> + *
>   * Description: Used to physically identify the NIC on the system.
>   * The Link LED will blink for a time specified by the user for
>   * identification.
>   * NOTE: The Link has to be Up to be able to blink the LED. Hence
>   * identification is possible only if it's link is up.
> - * Return value:
> - * int , returns 0 on success
>   */
>  
> -static int s2io_ethtool_idnic(struct net_device *dev, u32 data)
> +static int s2io_ethtool_set_led(struct net_device *dev,
> +				enum ethtool_phys_id_state state)
>  {
>  	u64 val64 = 0, last_gpio_ctrl_val;
>  	struct s2io_nic *sp = netdev_priv(dev);
> @@ -5543,24 +5540,27 @@ static int s2io_ethtool_idnic(struct net
>  		val64 = readq(&bar0->adapter_control);
>  		if (!(val64 & ADAPTER_CNTL_EN)) {
>  			pr_err("Adapter Link down, cannot blink LED\n");
> -			return -EFAULT;
> +			return -EAGAIN;
>  		}
>  	}
> -	if (sp->id_timer.function == NULL) {
> -		init_timer(&sp->id_timer);
> -		sp->id_timer.function = s2io_phy_id;
> -		sp->id_timer.data = (unsigned long)sp;
> -	}
> -	mod_timer(&sp->id_timer, jiffies);
> -	if (data)
> -		msleep_interruptible(data * HZ);
> -	else
> -		msleep_interruptible(MAX_FLICKER_TIME);
> -	del_timer_sync(&sp->id_timer);
>  
> -	if (CARDS_WITH_FAULTY_LINK_INDICATORS(sp->device_type, subid)) {
> -		writeq(last_gpio_ctrl_val, &bar0->gpio_control);
> -		last_gpio_ctrl_val = readq(&bar0->gpio_control);
> +	switch (state) {
> +	case ETHTOOL_ID_ACTIVE:
> +		return -EINVAL;
> +
> +	case ETHTOOL_ID_ON:
> +		s2io_set_led(sp, true);
> +		break;
> +
> +	case ETHTOOL_ID_OFF:
> +		s2io_set_led(sp, false);
> +		break;
> +
> +	case ETHTOOL_ID_INACTIVE:
> +		if (CARDS_WITH_FAULTY_LINK_INDICATORS(sp->device_type, subid)) {
> +			writeq(last_gpio_ctrl_val, &bar0->gpio_control);
> +			last_gpio_ctrl_val = readq(&bar0->gpio_control);
[...]

I think last_gpio_ctrl_val needs to be moved to struct s2io_nic and
initialised only in the ETHTOOL_ID_ACTIVE case.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: [PATCH net-next-2.6 00/12] Convert more drivers to ethtool set_phys_id
From: Ben Hutchings @ 2011-04-04 23:07 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David S. Miller, netdev
In-Reply-To: <20110404184340.604594357@linuxplumber.net>

On Mon, 2011-04-04 at 11:43 -0700, Stephen Hemminger wrote:
> Did a bunch of the easy drivers to convert.

Thanks Stephen.  I would have done some of these but I'm pretty snowed
under at the moment.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox