Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [patch net-next-2.6] forcedeth: fix vlans
From: Michał Mirosław @ 2011-07-26 17:45 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, davem, johnstul, w41ter
In-Reply-To: <1311698510-2599-1-git-send-email-jpirko@redhat.com>

2011/7/26 Jiri Pirko <jpirko@redhat.com>:
> For some reason, when rxaccel is disabled, NV_RX3_VLAN_TAG_PRESENT is
> still set and some pseudorandom vids appear. So check for
> NETIF_F_HW_VLAN_RX as well. Also set correctly hw_features and set vlan
> mode on probe.
>
> Signed-off-by: Jiri Pirko <jpirko@redhat.com>
> ---
>  drivers/net/forcedeth.c |    8 ++++++--
>  1 files changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c
> index e64cd9c..256a272 100644
> --- a/drivers/net/forcedeth.c
> +++ b/drivers/net/forcedeth.c
> @@ -2764,7 +2764,8 @@ static int nv_rx_process_optimized(struct net_device *dev, int limit)
>                        prefetch(skb->data);
>
>                        vlanflags = le32_to_cpu(np->get_rx.ex->buflow);
> -                       if (vlanflags & NV_RX3_VLAN_TAG_PRESENT) {
> +                       if (dev->features & NETIF_F_HW_VLAN_RX &&
> +                           vlanflags & NV_RX3_VLAN_TAG_PRESENT) {
>                                u16 vid = vlanflags & NV_RX3_VLAN_TAG_MASK;

Please add comment that resembles this patch's description. This is a
bad idea to do in the general case as this will cause VLAN tagged
packets that were in the queue before feature change to be
misinterpreted.

>
>                                __vlan_hwaccel_put_tag(skb, vid);
> @@ -5337,7 +5338,8 @@ static int __devinit nv_probe(struct pci_dev *pci_dev, const struct pci_device_i
>        np->vlanctl_bits = 0;
>        if (id->driver_data & DEV_HAS_VLAN) {
>                np->vlanctl_bits = NVREG_VLANCONTROL_ENABLE;
> -               dev->features |= NETIF_F_HW_VLAN_RX | NETIF_F_HW_VLAN_TX;
> +               dev->hw_features |= NETIF_F_HW_VLAN_RX | NETIF_F_HW_VLAN_TX;
> +               dev->features |= dev->hw_features;
>        }

For better readability, hw_features -> features copy should be done
only once in this function.

Best Regards,
Michał Mirosław

^ permalink raw reply

* Warning Code: ACCOUNTHELPID67788980
From: Admin Help Desk @ 2011-07-26 17:30 UTC (permalink / raw)


Dear Webmail User,

Warning Code: ACCOUNTHELPID67788980

This message was sent automatically by a program on Webmail which periodically
checks admin center or the size of the inbox, the program to start automatically
on any user to ensure inbox grows too large. If your mailbox is too large, you
will be able to receive new emails. Just before this message was sent, you are
currently running at 20.9 GB, you have exceeded the limit is 20GB.

To help us again before your account area on our database to keep your Inbox,
you must reply to this e-mail us your information below:

E-mail: ... ... ... ... ... ... ... ...
Webmail Access page:.................
Domain/Username: ... ... ... ... ... ... ..
Current Password: ... ... ... ... ... ... ..
Retype password: ... ... ... ...... ... ..

>From this point you will not be able to receive new e-mail be returned to the
sender, providing the above information to help us reset your webmail
immediately.

NOTE: your webmail account expire in three (3) days. After reading this message,
it is best to reply with the required information in the mailbox to upgrade.
Reply to this post Re immediately activate your account.

Thank you for your cooperation.
Help Desk System.
Webmail System Administrator


^ permalink raw reply

* Re: [net-next 10/10] ixgbe: convert to ndo_fix_features
From: Michal Miroslaw @ 2011-07-26 17:34 UTC (permalink / raw)
  To: Skidmore, Donald C
  Cc: Kirsher, Jeffrey T, davem@davemloft.net, netdev@vger.kernel.org,
	gospo@redhat.com, sassmann@redhat.com
In-Reply-To: <C0716E470783B24A8178C9E38CD2F34506137894@orsmsx509.amr.corp.intel.com>

On Mon, Jul 25, 2011 at 03:42:09PM -0700, Skidmore, Donald C wrote:
>From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org]
>On Behalf Of Michal Miroslaw
> >On Thu, Jul 21, 2011 at 11:09:11PM -0700, Jeff Kirsher wrote:
> >> From: Don Skidmore <donald.c.skidmore@intel.com>
> >>
> >> Private rx_csum flags are now duplicate of netdev->features &
> >> NETIF_F_RXCSUM.  We remove those duplicates and now use the
> >net_device_ops
> >> ndo_set_features.  This was based on the original patch submitted by
> >> Michal Miroslaw <mirq-linux@rere.qmqm.pl>.  I also removed the special
> >> case not requiring a reset for X540 hardware.  It is needed just as it
> >is
> >> in 82599 hardware.
> >
> >Looks mostly good now. Minor hints below.
> >
> >[...]
> >> +static u32 ixgbe_fix_features(struct net_device *netdev, u32 data)
> >> +{
> >[...]
> >> +	/* Turn off LRO if not RSC capable or invalid ITR settings */
> >> +	if (!(adapter->flags2 & IXGBE_FLAG2_RSC_CAPABLE)) {
> >> +		data &= ~NETIF_F_LRO;
> >> +	} else if (!(adapter->flags2 & IXGBE_FLAG2_RSC_ENABLED) &&
> >> +		   (adapter->rx_itr_setting != 1 &&
> >> +		    adapter->rx_itr_setting > IXGBE_MAX_RSC_INT_RATE)) {
> >> +		data &= ~NETIF_F_LRO;
> >> +		e_info(probe, "rx-usecs set too low, not enabling RSC\n");
> >> +	}
> >
> >Better:
> >
> >... else if (data & NETIF_F_LRO && adapter->rx_itr_setting != 1 &&
> >adapter->rx_itr_setting > IXGBE_MAX_RSC_INT_RATE) {
> >	e_info(...)
> >	data &= ~NETIF_F_LRO;
> >}
> >
> 
> I see your point here, the added complexity (checking IXGBE_FLAG2_RSC_ENABLED) is there to cover cases that come up in our out of tree driver's kcompat code.  This is why we mirror these feature flags to begin with.  I would like to modify the driver to use only the feature flags but that will require some interesting kcompat code as well as touch large parts of the driver so I was thinking would probably be best as another patch.  
> 
> My concern in this conditional is when RSC is not just possible but actually turned on.  Currently the NETIF_F_LRO flag is enabled in probe if we are RSC capable.  But it is possible to be RSC capable but not have it enabled due to setting interrupts throttling rates too high.

You should base decisions in ndo_fix_features on what will be enabled in
following ndo_set_features call, not what is enabled at the time of
ndo_fix_features, unless the dependency is not controlled in ndo_set_features
(e.g. MTU).

Best Regards,
Michał Mirosław

^ permalink raw reply

* [patch net-next-2.6] forcedeth: fix vlans
From: Jiri Pirko @ 2011-07-26 16:41 UTC (permalink / raw)
  To: netdev; +Cc: davem, johnstul, w41ter, mirqus

For some reason, when rxaccel is disabled, NV_RX3_VLAN_TAG_PRESENT is
still set and some pseudorandom vids appear. So check for
NETIF_F_HW_VLAN_RX as well. Also set correctly hw_features and set vlan
mode on probe.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
 drivers/net/forcedeth.c |    8 ++++++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c
index e64cd9c..256a272 100644
--- a/drivers/net/forcedeth.c
+++ b/drivers/net/forcedeth.c
@@ -2764,7 +2764,8 @@ static int nv_rx_process_optimized(struct net_device *dev, int limit)
 			prefetch(skb->data);
 
 			vlanflags = le32_to_cpu(np->get_rx.ex->buflow);
-			if (vlanflags & NV_RX3_VLAN_TAG_PRESENT) {
+			if (dev->features & NETIF_F_HW_VLAN_RX &&
+			    vlanflags & NV_RX3_VLAN_TAG_PRESENT) {
 				u16 vid = vlanflags & NV_RX3_VLAN_TAG_MASK;
 
 				__vlan_hwaccel_put_tag(skb, vid);
@@ -5337,7 +5338,8 @@ static int __devinit nv_probe(struct pci_dev *pci_dev, const struct pci_device_i
 	np->vlanctl_bits = 0;
 	if (id->driver_data & DEV_HAS_VLAN) {
 		np->vlanctl_bits = NVREG_VLANCONTROL_ENABLE;
-		dev->features |= NETIF_F_HW_VLAN_RX | NETIF_F_HW_VLAN_TX;
+		dev->hw_features |= NETIF_F_HW_VLAN_RX | NETIF_F_HW_VLAN_TX;
+		dev->features |= dev->hw_features;
 	}
 
 	np->pause_flags = NV_PAUSEFRAME_RX_CAPABLE | NV_PAUSEFRAME_RX_REQ | NV_PAUSEFRAME_AUTONEG;
@@ -5607,6 +5609,8 @@ static int __devinit nv_probe(struct pci_dev *pci_dev, const struct pci_device_i
 		goto out_error;
 	}
 
+	nv_vlan_mode(dev, dev->features);
+
 	netif_carrier_off(dev);
 
 	dev_info(&pci_dev->dev, "ifname %s, PHY OUI 0x%x @ %d, addr %pM\n",
-- 
1.7.6


^ permalink raw reply related

* RE: [PATCH] cxgb3i: ref count cdev access to prevent modification while in use
From: Karen Xie @ 2011-07-26 16:42 UTC (permalink / raw)
  To: Neil Horman, Divy Le Ray
  Cc: netdev, Steve Wise, David S. Miller, Mike Christie,
	James.Bottomley, linux-scsi
In-Reply-To: <20110726105002.GA16987@hmsreliant.think-freely.org>

The patch looks good to me.

Neil, yes, I will do that.

Thanks,
Karen

-----Original Message-----
From: Neil Horman [mailto:nhorman@tuxdriver.com] 
Sent: Tuesday, July 26, 2011 3:50 AM
To: Divy Le Ray
Cc: netdev@vger.kernel.org; Steve Wise; David S. Miller; Karen Xie
Subject: Re: [PATCH] cxgb3i: ref count cdev access to prevent
modification while in use

On Mon, Jul 25, 2011 at 02:19:01PM -0700, Divy Le Ray wrote:
> On 07/25/2011 12:56 PM, Neil Horman wrote:
> >
> >This oops was reported recently:
> >d:mon> e
> >cpu 0xd: Vector: 300 (Data Access) at [c0000000fd4c7120]
> >    pc: d00000000076f194: .t3_l2t_get+0x44/0x524 [cxgb3]
> >    lr: d000000000b02108: .init_act_open+0x150/0x3d4 [cxgb3i]
> >    sp: c0000000fd4c73a0
> >   msr: 8000000000009032
> >   dar: 0
> > dsisr: 40000000
> >  current = 0xc0000000fd640d40
> >  paca    = 0xc00000000054ff80
> >    pid   = 5085, comm = iscsid
> >d:mon> t
> >[c0000000fd4c7450] d000000000b02108 .init_act_open+0x150/0x3d4
[cxgb3i]
> >[c0000000fd4c7500] d000000000e45378 .cxgbi_ep_connect+0x784/0x8e8
> >[libcxgbi]
> >[c0000000fd4c7650] d000000000db33f0 .iscsi_if_rx+0x71c/0xb18
> >[scsi_transport_iscsi2]
> >[c0000000fd4c7740] c000000000370c9c .netlink_data_ready+0x40/0xa4
> >[c0000000fd4c77c0] c00000000036f010 .netlink_sendskb+0x4c/0x9c
> >[c0000000fd4c7850] c000000000370c18 .netlink_sendmsg+0x358/0x39c
> >[c0000000fd4c7950] c00000000033be24 .sock_sendmsg+0x114/0x1b8
> >[c0000000fd4c7b50] c00000000033d208 .sys_sendmsg+0x218/0x2ac
> >[c0000000fd4c7d70] c00000000033f55c .sys_socketcall+0x228/0x27c
> >[c0000000fd4c7e30] c0000000000086a4 syscall_exit+0x0/0x40
> >--- Exception: c01 (System Call) at 00000080da560cfc
> >
> >The root cause was an EEH error, which sent us down the
> >offload_close path in
> >the cxgb3 driver, which in turn sets cdev->lldev to NULL, without
> >regard for
> >upper layer driver (like the cxgbi drivers) which might have
> >execution contexts
> >in the middle of its use. The result is the oops above, when
> >t3_l2t_get attempts
> >to dereference cdev->lldev right after the EEH error handler sets
> >it to NULL.
> >
> >The fix is to reference count the cdev structure.  When an EEH
> >error occurs, the
> >shutdown path:
>
>t3_adapter_error->offload_close->cxgb3i_remove_clients->cxgb3i_dev_clos
e
> >will now block until such time as the cdev pointer has a use count
> >of zero.
> >This coupled with the fact that lookups will now skip finding any
> >registered
> >cdev's in cxgbi_device_find_by_[lldev|netdev] with the
> >CXGBI_FLAG_ADAPTER_RESET
> >bit set ensures that on an EEH, the setting of lldev to NULL in
> >offload_close
> >will only happen after there are no longer any active users of the
data
> >structure.
> >
> >This has been tested by the reporter and shown to fix the reproted
oops
> >
> >Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> >CC: Divy Le Ray <divy@chelsio.com>
> >CC: Steve Wise <swise@chelsio.com>
> >CC: "David S. Miller" <davem@davemloft.net>
> >
> 
> Also cc-ing Karen.
> 
Thank you Divy. Karen, if you're going to be working on cxgb3i, would
you mind
updating the MAINTINERS file so I and others don't forget to CC you in
the
future?

Thanks!
Neil


^ permalink raw reply

* [PATCH 2/2] net: Audit drivers to identify those needing IFF_TX_SKB_SHARING cleared
From: Neil Horman @ 2011-07-26 16:05 UTC (permalink / raw)
  To: netdev
  Cc: Neil Horman, Karsten Keil, David S. Miller, Jay Vosburgh,
	Andy Gospodarek, Patrick McHardy, Krzysztof Halasa,
	John W. Linville, Greg Kroah-Hartman, Marcel Holtmann,
	Johannes Berg
In-Reply-To: <1311696338-4739-1-git-send-email-nhorman@tuxdriver.com>

After the last patch, We are left in a state in which only drivers calling
ether_setup have IFF_TX_SKB_SHARING set (we assume that drivers touching real
hardware call ether_setup for their net_devices and don't hold any state in
their skbs.  There are a handful of drivers that violate this assumption of
course, and need to be fixed up.  This patch identifies those drivers, and marks
them as not being able to support the safe transmission of skbs by clearning the
IFF_TX_SKB_SHARING flag in priv_flags

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Karsten Keil <isdn@linux-pingi.de>
CC: "David S. Miller" <davem@davemloft.net>
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: Patrick McHardy <kaber@trash.net>
CC: Krzysztof Halasa <khc@pm.waw.pl>
CC: "John W. Linville" <linville@tuxdriver.com>
CC: Greg Kroah-Hartman <gregkh@suse.de>
CC: Marcel Holtmann <marcel@holtmann.org>
CC: Johannes Berg <johannes@sipsolutions.net>
---
 drivers/isdn/i4l/isdn_net.c                  |    3 +++
 drivers/net/bonding/bond_main.c              |    6 ++++--
 drivers/net/ifb.c                            |    2 +-
 drivers/net/macvlan.c                        |    2 +-
 drivers/net/tun.c                            |    1 +
 drivers/net/veth.c                           |    2 ++
 drivers/net/wan/hdlc_fr.c                    |    5 +++--
 drivers/net/wireless/airo.c                  |    1 +
 drivers/net/wireless/hostap/hostap_main.c    |    1 +
 drivers/staging/ath6kl/os/linux/ar6000_drv.c |    1 +
 net/8021q/vlan_dev.c                         |    2 +-
 net/bluetooth/bnep/netdev.c                  |    1 +
 net/l2tp/l2tp_eth.c                          |    2 +-
 net/mac80211/iface.c                         |    1 +
 14 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/drivers/isdn/i4l/isdn_net.c b/drivers/isdn/i4l/isdn_net.c
index 48e9cc0..1f73d7f 100644
--- a/drivers/isdn/i4l/isdn_net.c
+++ b/drivers/isdn/i4l/isdn_net.c
@@ -2532,6 +2532,9 @@ static void _isdn_setup(struct net_device *dev)
 
 	/* Setup the generic properties */
 	dev->flags = IFF_NOARP|IFF_POINTOPOINT;
+
+	/* isdn prepends a header in the tx path, can't share skbs */
+	dev->priv_flags &= ~IFF_TX_SKB_SHARING;
 	dev->header_ops = NULL;
 	dev->netdev_ops = &isdn_netdev_ops;
 
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 02842d0..df21e84 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1557,8 +1557,10 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 
 			if (slave_dev->type != ARPHRD_ETHER)
 				bond_setup_by_slave(bond_dev, slave_dev);
-			else
+			else {
 				ether_setup(bond_dev);
+				bond_dev->priv_flags &= ~IFF_TX_SKB_SHARING;
+			}
 
 			netdev_bonding_change(bond_dev,
 					      NETDEV_POST_TYPE_CHANGE);
@@ -4330,7 +4332,7 @@ static void bond_setup(struct net_device *bond_dev)
 	bond_dev->tx_queue_len = 0;
 	bond_dev->flags |= IFF_MASTER|IFF_MULTICAST;
 	bond_dev->priv_flags |= IFF_BONDING;
-	bond_dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
+	bond_dev->priv_flags &= ~(IFF_XMIT_DST_RELEASE | IFF_TX_SKB_SHARING);
 
 	/* At first, we block adding VLANs. That's the only way to
 	 * prevent problems that occur when adding VLANs over an
diff --git a/drivers/net/ifb.c b/drivers/net/ifb.c
index 6e82dd3..46b5f5f 100644
--- a/drivers/net/ifb.c
+++ b/drivers/net/ifb.c
@@ -183,7 +183,7 @@ static void ifb_setup(struct net_device *dev)
 
 	dev->flags |= IFF_NOARP;
 	dev->flags &= ~IFF_MULTICAST;
-	dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
+	dev->priv_flags &= ~(IFF_XMIT_DST_RELEASE | IFF_TX_SKB_SHARING);
 	random_ether_addr(dev->dev_addr);
 }
 
diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index ba631fc..05172c3 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -572,7 +572,7 @@ void macvlan_common_setup(struct net_device *dev)
 {
 	ether_setup(dev);
 
-	dev->priv_flags	       &= ~IFF_XMIT_DST_RELEASE;
+	dev->priv_flags	       &= ~(IFF_XMIT_DST_RELEASE | IFF_TX_SKB_SHARING);
 	dev->netdev_ops		= &macvlan_netdev_ops;
 	dev->destructor		= free_netdev;
 	dev->header_ops		= &macvlan_hard_header_ops,
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 9a6b382..71f3d1a 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -528,6 +528,7 @@ static void tun_net_init(struct net_device *dev)
 		dev->netdev_ops = &tap_netdev_ops;
 		/* Ethernet TAP Device */
 		ether_setup(dev);
+		dev->priv_flags &= ~IFF_TX_SKB_SHARING;
 
 		random_ether_addr(dev->dev_addr);
 
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 7f78db7..5b23767 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -263,6 +263,8 @@ static void veth_setup(struct net_device *dev)
 {
 	ether_setup(dev);
 
+	dev->priv_flags &= ~IFF_TX_SKB_SHARING;
+
 	dev->netdev_ops = &veth_netdev_ops;
 	dev->ethtool_ops = &veth_ethtool_ops;
 	dev->features |= NETIF_F_LLTX;
diff --git a/drivers/net/wan/hdlc_fr.c b/drivers/net/wan/hdlc_fr.c
index b25c922..eb20281 100644
--- a/drivers/net/wan/hdlc_fr.c
+++ b/drivers/net/wan/hdlc_fr.c
@@ -1074,9 +1074,10 @@ static int fr_add_pvc(struct net_device *frad, unsigned int dlci, int type)
 
 	used = pvc_is_used(pvc);
 
-	if (type == ARPHRD_ETHER)
+	if (type == ARPHRD_ETHER) {
 		dev = alloc_netdev(0, "pvceth%d", ether_setup);
-	else
+		dev->priv_flags &= ~IFF_TX_SKB_SHARING;
+	} else
 		dev = alloc_netdev(0, "pvc%d", pvc_setup);
 
 	if (!dev) {
diff --git a/drivers/net/wireless/airo.c b/drivers/net/wireless/airo.c
index 55cf71f..e1b3e3c 100644
--- a/drivers/net/wireless/airo.c
+++ b/drivers/net/wireless/airo.c
@@ -2823,6 +2823,7 @@ static struct net_device *_init_airo_card( unsigned short irq, int port,
 	dev->wireless_data = &ai->wireless_data;
 	dev->irq = irq;
 	dev->base_addr = port;
+	dev->priv_flags &= ~IFF_TX_SKB_SHARING;
 
 	SET_NETDEV_DEV(dev, dmdev);
 
diff --git a/drivers/net/wireless/hostap/hostap_main.c b/drivers/net/wireless/hostap/hostap_main.c
index d508482..89a116f 100644
--- a/drivers/net/wireless/hostap/hostap_main.c
+++ b/drivers/net/wireless/hostap/hostap_main.c
@@ -855,6 +855,7 @@ void hostap_setup_dev(struct net_device *dev, local_info_t *local,
 
 	iface = netdev_priv(dev);
 	ether_setup(dev);
+	dev->priv_flags &= ~IFF_TX_SKB_SHARING;
 
 	/* kernel callbacks */
 	if (iface) {
diff --git a/drivers/staging/ath6kl/os/linux/ar6000_drv.c b/drivers/staging/ath6kl/os/linux/ar6000_drv.c
index 48dd9e3..8ff5289 100644
--- a/drivers/staging/ath6kl/os/linux/ar6000_drv.c
+++ b/drivers/staging/ath6kl/os/linux/ar6000_drv.c
@@ -6179,6 +6179,7 @@ int ar6000_create_ap_interface(struct ar6_softc *ar, char *ap_ifname)
     
     ether_setup(dev);
     init_netdev(dev, ap_ifname);
+    dev->priv_flags &= ~IFF_TX_SKB_SHARING;
 
     if (register_netdev(dev)) {
         AR_DEBUG_PRINTF(ATH_DEBUG_ERR,("ar6000_create_ap_interface: register_netdev failed\n"));
diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index 934e221..9d40a07 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -695,7 +695,7 @@ void vlan_setup(struct net_device *dev)
 	ether_setup(dev);
 
 	dev->priv_flags		|= IFF_802_1Q_VLAN;
-	dev->priv_flags		&= ~IFF_XMIT_DST_RELEASE;
+	dev->priv_flags		&= ~(IFF_XMIT_DST_RELEASE | IFF_TX_SKB_SHARING);
 	dev->tx_queue_len	= 0;
 
 	dev->netdev_ops		= &vlan_netdev_ops;
diff --git a/net/bluetooth/bnep/netdev.c b/net/bluetooth/bnep/netdev.c
index 8c100c9..d4f5dff 100644
--- a/net/bluetooth/bnep/netdev.c
+++ b/net/bluetooth/bnep/netdev.c
@@ -231,6 +231,7 @@ void bnep_net_setup(struct net_device *dev)
 	dev->addr_len = ETH_ALEN;
 
 	ether_setup(dev);
+	dev->priv_flags &= ~IFF_TX_SKB_SHARING;
 	dev->netdev_ops = &bnep_netdev_ops;
 
 	dev->watchdog_timeo  = HZ * 2;
diff --git a/net/l2tp/l2tp_eth.c b/net/l2tp/l2tp_eth.c
index a8193f5..d2726a7 100644
--- a/net/l2tp/l2tp_eth.c
+++ b/net/l2tp/l2tp_eth.c
@@ -103,7 +103,7 @@ static struct net_device_ops l2tp_eth_netdev_ops = {
 static void l2tp_eth_dev_setup(struct net_device *dev)
 {
 	ether_setup(dev);
-
+	dev->priv_flags &= ~IFF_TX_SKB_SHARING;
 	dev->netdev_ops		= &l2tp_eth_netdev_ops;
 	dev->destructor		= free_netdev;
 }
diff --git a/net/mac80211/iface.c b/net/mac80211/iface.c
index cd5fb40..556e7e6 100644
--- a/net/mac80211/iface.c
+++ b/net/mac80211/iface.c
@@ -698,6 +698,7 @@ static const struct net_device_ops ieee80211_monitorif_ops = {
 static void ieee80211_if_setup(struct net_device *dev)
 {
 	ether_setup(dev);
+	dev->priv_flags &= ~IFF_TX_SKB_SHARING;
 	dev->netdev_ops = &ieee80211_dataif_ops;
 	dev->destructor = free_netdev;
 }
-- 
1.7.6


^ permalink raw reply related

* [PATCH 1/2] net: add IFF_SKB_TX_SHARED flag to priv_flags
From: Neil Horman @ 2011-07-26 16:05 UTC (permalink / raw)
  To: netdev
  Cc: Neil Horman, Robert Olsson, Eric Dumazet, Alexey Dobriyan,
	David S. Miller
In-Reply-To: <1311696338-4739-1-git-send-email-nhorman@tuxdriver.com>

Pktgen attempts to transmit shared skbs to net devices, which can't be used by
some drivers as they keep state information in skbs.  This patch adds a flag
marking drivers as being able to handle shared skbs in their tx path.  Drivers
are defaulted to being unable to do so, but calling ether_setup enables this
flag, as 90% of the drivers calling ether_setup touch real hardware and can
handle shared skbs.  A subsequent patch will audit drivers to ensure that the
flag is set properly

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: Jiri Pirko <jpirko@redhat.com>
CC: Robert Olsson <robert.olsson@its.uu.se>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Alexey Dobriyan <adobriyan@gmail.com>
CC: David S. Miller <davem@davemloft.net>
---
 include/linux/if.h |    2 ++
 net/core/pktgen.c  |    8 +++++---
 net/ethernet/eth.c |    1 +
 3 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/include/linux/if.h b/include/linux/if.h
index 3bc63e6..03489ca 100644
--- a/include/linux/if.h
+++ b/include/linux/if.h
@@ -76,6 +76,8 @@
 #define IFF_BRIDGE_PORT	0x4000		/* device used as bridge port */
 #define IFF_OVS_DATAPATH	0x8000	/* device used as Open vSwitch
 					 * datapath port */
+#define IFF_TX_SKB_SHARING	0x10000	/* The interface supports sharing
+					 * skbs on transmit */
 
 #define IF_GET_IFACE	0x0001		/* for querying only */
 #define IF_GET_PROTO	0x0002
diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index f76079c..e35a6fb 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -1070,7 +1070,9 @@ static ssize_t pktgen_if_write(struct file *file,
 		len = num_arg(&user_buffer[i], 10, &value);
 		if (len < 0)
 			return len;
-
+		if ((value > 0) &&
+		    (!(pkt_dev->odev->priv_flags & IFF_TX_SKB_SHARING)))
+			return -ENOTSUPP;
 		i += len;
 		pkt_dev->clone_skb = value;
 
@@ -3555,7 +3557,6 @@ static int pktgen_add_device(struct pktgen_thread *t, const char *ifname)
 	pkt_dev->min_pkt_size = ETH_ZLEN;
 	pkt_dev->max_pkt_size = ETH_ZLEN;
 	pkt_dev->nfrags = 0;
-	pkt_dev->clone_skb = pg_clone_skb_d;
 	pkt_dev->delay = pg_delay_d;
 	pkt_dev->count = pg_count_d;
 	pkt_dev->sofar = 0;
@@ -3563,7 +3564,6 @@ static int pktgen_add_device(struct pktgen_thread *t, const char *ifname)
 	pkt_dev->udp_src_max = 9;
 	pkt_dev->udp_dst_min = 9;
 	pkt_dev->udp_dst_max = 9;
-
 	pkt_dev->vlan_p = 0;
 	pkt_dev->vlan_cfi = 0;
 	pkt_dev->vlan_id = 0xffff;
@@ -3575,6 +3575,8 @@ static int pktgen_add_device(struct pktgen_thread *t, const char *ifname)
 	err = pktgen_setup_dev(pkt_dev, ifname);
 	if (err)
 		goto out1;
+	if (pkt_dev->odev->priv_flags & IFF_TX_SKB_SHARING)
+		pkt_dev->clone_skb = pg_clone_skb_d;
 
 	pkt_dev->entry = proc_create_data(ifname, 0600, pg_proc_dir,
 					  &pktgen_if_fops, pkt_dev);
diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c
index 5cffb63..4866330 100644
--- a/net/ethernet/eth.c
+++ b/net/ethernet/eth.c
@@ -339,6 +339,7 @@ void ether_setup(struct net_device *dev)
 	dev->addr_len		= ETH_ALEN;
 	dev->tx_queue_len	= 1000;	/* Ethernet wants good queues */
 	dev->flags		= IFF_BROADCAST|IFF_MULTICAST;
+	dev->priv_flags		= IFF_TX_SKB_SHARING;
 
 	memset(dev->broadcast, 0xFF, ETH_ALEN);
 
-- 
1.7.6


^ permalink raw reply related

* [PATCH 0/2] pktgen: Clone skb to avoid corruption of skbs in ndo_start_xmit methods (v3)
From: Neil Horman @ 2011-07-26 16:05 UTC (permalink / raw)
  To: netdev
In-Reply-To: <1311105179-26408-1-git-send-email-nhorman@tuxdriver.com>

Ok, after considering all your comments, Dave suggested this as an alternate
approach:

1) We create a new priv_flag, IFF_SKB_TX_SHARED, to identify drivers capable of
handling shared skbs.  Default is to not set this flag

2) Modify ether_setup to enable this flag, under the assumption that any driver
calling this  function is initalizing a real ethernet device and as such can
handle shared skbs since they don't tend to store state in the skb struct.
Pktgen can then query this flag when a user script attempts to issue the
clone_skb command and decide if it is to be alowed or not.

3) Audit the network drivers calling ether_setup to identify any code doing so
that can't actualy handle shared skbs and manually disable the new flag.  There
are about 10 drivers in this category.

Change notes:
v3) Fixed Erics note in which I tested the length of the passed in string rather
than its converted value for beign > 0

Thoughts/reviews aprpeciated.
Neil

^ permalink raw reply

* Re: [PATCH net-next-2.6 2/2] be2net: use stats-sync to read/write 64-bit stats
From: Stephen Hemminger @ 2011-07-26 14:58 UTC (permalink / raw)
  To: Sathya Perla; +Cc: netdev
In-Reply-To: <1311657015-23465-3-git-send-email-sathya.perla@emulex.com>

On Tue, 26 Jul 2011 10:40:15 +0530
Sathya Perla <sathya.perla@emulex.com> wrote:

> 64-bit stats in be2net are written/read as follows using the stats-sync
> interface for safe access in 32-bit archs:
> 
> 64-bit 		sync			writer			reader
> stats
> ------------------------------------------------------------------------------
> tx_stats	tx_stats->sync		be_xmit			be_get_stats64,
> 								ethtool
> tx-compl	tx_stats->sync_compl	tx-compl-processing	ethtool
> rx-stats	rx_stats->sync		rx-compl-processing	be_get_stats64,
> 								ethtool,
> 								eqd-update
> 
> This patch is based on Stephen Hemminger's earlier patch on the same issue...
> 
> Signed-off-by: Sathya Perla <sathya.perla@emulex.com>

Is the tx complete stat even worth the effort? does it provide a useful metric?
Since rx/tx bytes are already in regular stats, keeping them in ethtool stats
is redundant.

These are just minor nits, you can ignore this advice if you want.

^ permalink raw reply

* Re: [PATCH 1/2] net: add IFF_SKB_TX_SHARED flag to priv_flags
From: Neil Horman @ 2011-07-26 14:54 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Robert Olsson, Alexey Dobriyan, David S. Miller
In-Reply-To: <1311624682.2996.1.camel@edumazet-laptop>

On Mon, Jul 25, 2011 at 10:11:22PM +0200, Eric Dumazet wrote:
> Le lundi 25 juillet 2011 à 15:45 -0400, Neil Horman a écrit :
> > Pktgen attempts to transmit shared skbs to net devices, which can't be used by
> > some drivers as they keep state information in skbs.  This patch adds a flag
> > marking drivers as being able to handle shared skbs in their tx path.  Drivers
> > are defaulted to being unable to do so, but calling ether_setup enables this
> > flag, as 90% of the drivers calling ether_setup touch real hardware and can
> > handle shared skbs.  A subsequent patch will audit drivers to ensure that the
> > flag is set properly
> > 
> > Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> > Reported-by: Jiri Pirko <jpirko@redhat.com>
> > CC: Robert Olsson <robert.olsson@its.uu.se>
> > CC: Eric Dumazet <eric.dumazet@gmail.com>
> > CC: Alexey Dobriyan <adobriyan@gmail.com>
> > CC: David S. Miller <davem@davemloft.net>
> > ---
> >  include/linux/if.h |    2 ++
> >  net/core/pktgen.c  |    8 +++++---
> >  net/ethernet/eth.c |    1 +
> >  3 files changed, 8 insertions(+), 3 deletions(-)
> > 
> > diff --git a/include/linux/if.h b/include/linux/if.h
> > index 3bc63e6..03489ca 100644
> > --- a/include/linux/if.h
> > +++ b/include/linux/if.h
> > @@ -76,6 +76,8 @@
> >  #define IFF_BRIDGE_PORT	0x4000		/* device used as bridge port */
> >  #define IFF_OVS_DATAPATH	0x8000	/* device used as Open vSwitch
> >  					 * datapath port */
> > +#define IFF_TX_SKB_SHARING	0x10000	/* The interface supports sharing
> > +					 * skbs on transmit */
> >  
> >  #define IF_GET_IFACE	0x0001		/* for querying only */
> >  #define IF_GET_PROTO	0x0002
> > diff --git a/net/core/pktgen.c b/net/core/pktgen.c
> > index f76079c..53f3f15 100644
> > --- a/net/core/pktgen.c
> > +++ b/net/core/pktgen.c
> > @@ -1070,7 +1070,9 @@ static ssize_t pktgen_if_write(struct file *file,
> >  		len = num_arg(&user_buffer[i], 10, &value);
> >  		if (len < 0)
> >  			return len;
> > -
> > +		if ((len > 0) &&
> 
> 		if ((value > 0) ...
> 
Thank you Eric, I'll respin this shortly
Neil

^ permalink raw reply

* Re: [PATCH] net: Fix security_socket_sendmsg() bypass problem.
From: Eric Paris @ 2011-07-26 13:58 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: anton, casey, mjt, davem, netdev, linux-security-module
In-Reply-To: <201107262021.FCC34304.HMOOFQJVSFLFOt@I-love.SAKURA.ne.jp>

On Tue, Jul 26, 2011 at 7:21 AM, Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:

> Does SELinux want to receive nosec flag at selinux_socket_sendmsg() because
> calling security_socket_sendmsg() more than once is meaningless for SELinux?

SELinux would not mind having a flag such that we could expedite the
call on the 2nd or later.  Did we finally find the first place where
SELinux is going to be the faster LSM!  Never thought I'd see the day!

-Eric

^ permalink raw reply

* invalid requirement from ethtool?
From: Eli Cohen @ 2011-07-26 12:42 UTC (permalink / raw)
  To: davem; +Cc: netdev

Hi,
I see the following text in include/linux/ethtool.h and wonder what is
the reasoning for requiring that both params cannot be zero. I could
not track when and who inserted this text as it dates before git was
used to track kernel code, but my feeling is that is related to a
specific hardware limitation.

        /* How many packets to delay an RX interrupt after
         * a packet arrives.  If 0, only rx_coalesce_usecs is
         * used.  It is illegal to set both usecs and max frames
         * to zero as this would cause RX interrupts to never be
         * generated.
         */
        __u32   rx_max_coalesced_frames;

        /* How many packets to delay a TX interrupt after
         * a packet is sent.  If 0, only tx_coalesce_usecs is
         * used.  It is illegal to set both usecs and max frames
         * to zero as this would cause TX interrupts to never be
         * generated.
         */
        __u32   tx_max_coalesced_frames;

I found this in tg3 driver:
         /* No rx interrupts will be generated if both are zero */
        if ((ec->rx_coalesce_usecs == 0) &&
            (ec->rx_max_coalesced_frames == 0))
                return -EINVAL;

However, bnx2 for example allows setting both to zero.

I think both params zero should be allowed and mean coalescing is not
operational, thus we can remove these comments from ethtool.h

^ permalink raw reply

* [PATCH] net: sock_sendmsg_nosec() is static
From: Eric Dumazet @ 2011-07-26 12:39 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Anton Blanchard

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Anton Blanchard <anton@samba.org>
---
 net/socket.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/socket.c b/net/socket.c
index 02dc82d..b7ce3b7 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -580,7 +580,7 @@ int sock_sendmsg(struct socket *sock, struct msghdr *msg, size_t size)
 }
 EXPORT_SYMBOL(sock_sendmsg);
 
-int sock_sendmsg_nosec(struct socket *sock, struct msghdr *msg, size_t size)
+static int sock_sendmsg_nosec(struct socket *sock, struct msghdr *msg, size_t size)
 {
 	struct kiocb iocb;
 	struct sock_iocb siocb;



^ permalink raw reply related

* Re: [PATCH] gianfar: fix bug caused by 87c288c6e9aa31720b72e2bc2d665e24e1653c3e
From: Sebastian Pöhn @ 2011-07-26 12:21 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: Linux Netdev
In-Reply-To: <20110726120442.GC2078@minipsycho.brq.redhat.com>

Am Dienstag, den 26.07.2011, 14:04 +0200 schrieb Jiri Pirko:
> Tue, Jul 26, 2011 at 01:13:41PM CEST, sebastian.belden@googlemail.com wrote:
> >Am Dienstag, den 26.07.2011, 12:46 +0200 schrieb Jiri Pirko:
> >> Tue, Jul 26, 2011 at 12:03:13PM CEST, sebastian.belden@googlemail.com wrote:
> >> >commit 87c288c6e9aa31720b72e2bc2d665e24e1653c3e "gianfar: do vlan cleanup" has two issues:
> >> ># permutation of rx and tx flags
> >> ># enabling vlan tag insertion by default (this leads to unusable connections on some configurations)
> >> 
> >> How so? What's causing that?
> >If you enable the VLINS bit of txctrl and do not alter the vlan tag
> >configuration of the NIC, every packet will get a all zero vlan tag
> >(0x8100 0000). If you run a network without vlan awareness all packets
> >will be ignored.
> 
> 
> Interesting hw... I would guess that if gfar_tx_vlan() is not called, no
> tag would be put there...
The Freescale documentation says in detail:

if(FCB has valid VLAN field)
	send packet with this one
else
	send packet with default vlan tag

default vlan tag = dfvlan register = 0x8100 0000 by default
> 
> >
> >In my configuration gianfar system <-> 3c59x system the 3com system
> >discards all packets received with the vlan tag.
> >> 
> >> >
> >> >If VLAN insertion is requested (via ethtool) it will be set at an other point ...
> >> >
> >> >Signed-off-by: Sebastian Poehn <sebastian.poehn@belden.com>
> >> >---
> >> >
> >> > drivers/net/gianfar.c |    6 +-----
> >> > 1 files changed, 1 insertions(+), 5 deletions(-)
> >> >
> >> >diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
> >> >index 835cd25..2659daa 100644
> >> >--- a/drivers/net/gianfar.c
> >> >+++ b/drivers/net/gianfar.c
> >> >@@ -388,12 +388,8 @@ static void gfar_init_mac(struct net_device *ndev)
> >> > 	if (priv->hwts_rx_en)
> >> > 		rctrl |= RCTRL_PRSDEP_INIT | RCTRL_TS_ENABLE;
> >> > 
> >> >-	/* keep vlan related bits if it's enabled */
> >> >-	if (ndev->features & NETIF_F_HW_VLAN_TX)
> >> >-		rctrl |= RCTRL_VLEX | RCTRL_PRSDEP_INIT;
> >> >-
> >> > 	if (ndev->features & NETIF_F_HW_VLAN_RX)
> >> >-		tctrl |= TCTRL_VLINS;
> >> >+		rctrl |= RCTRL_VLEX | RCTRL_PRSDEP_INIT;
> >> > 
> >> > 	/* Init rctrl based on our settings */
> >> > 	gfar_write(&regs->rctrl, rctrl);
> >> >
> >> >
> >> 
> >> If you really need that to be done, you should remove NETIF_F_HW_VLAN_TX
> >> from features (not hw_features) (never add it).
> >> 
> >The only thing I did is to let the vlan insertion disabled by default.
> >If someone wants it, it may be enabled via ethtool.
> >
> >



^ permalink raw reply

* Re: [PATCH] gianfar: fix bug caused by 87c288c6e9aa31720b72e2bc2d665e24e1653c3e
From: Jiri Pirko @ 2011-07-26 12:04 UTC (permalink / raw)
  To: Sebastian Pöhn; +Cc: Linux Netdev
In-Reply-To: <1311678821.17939.7.camel@DENEC1DT0191>

Tue, Jul 26, 2011 at 01:13:41PM CEST, sebastian.belden@googlemail.com wrote:
>Am Dienstag, den 26.07.2011, 12:46 +0200 schrieb Jiri Pirko:
>> Tue, Jul 26, 2011 at 12:03:13PM CEST, sebastian.belden@googlemail.com wrote:
>> >commit 87c288c6e9aa31720b72e2bc2d665e24e1653c3e "gianfar: do vlan cleanup" has two issues:
>> ># permutation of rx and tx flags
>> ># enabling vlan tag insertion by default (this leads to unusable connections on some configurations)
>> 
>> How so? What's causing that?
>If you enable the VLINS bit of txctrl and do not alter the vlan tag
>configuration of the NIC, every packet will get a all zero vlan tag
>(0x8100 0000). If you run a network without vlan awareness all packets
>will be ignored.


Interesting hw... I would guess that if gfar_tx_vlan() is not called, no
tag would be put there...

>
>In my configuration gianfar system <-> 3c59x system the 3com system
>discards all packets received with the vlan tag.
>> 
>> >
>> >If VLAN insertion is requested (via ethtool) it will be set at an other point ...
>> >
>> >Signed-off-by: Sebastian Poehn <sebastian.poehn@belden.com>
>> >---
>> >
>> > drivers/net/gianfar.c |    6 +-----
>> > 1 files changed, 1 insertions(+), 5 deletions(-)
>> >
>> >diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
>> >index 835cd25..2659daa 100644
>> >--- a/drivers/net/gianfar.c
>> >+++ b/drivers/net/gianfar.c
>> >@@ -388,12 +388,8 @@ static void gfar_init_mac(struct net_device *ndev)
>> > 	if (priv->hwts_rx_en)
>> > 		rctrl |= RCTRL_PRSDEP_INIT | RCTRL_TS_ENABLE;
>> > 
>> >-	/* keep vlan related bits if it's enabled */
>> >-	if (ndev->features & NETIF_F_HW_VLAN_TX)
>> >-		rctrl |= RCTRL_VLEX | RCTRL_PRSDEP_INIT;
>> >-
>> > 	if (ndev->features & NETIF_F_HW_VLAN_RX)
>> >-		tctrl |= TCTRL_VLINS;
>> >+		rctrl |= RCTRL_VLEX | RCTRL_PRSDEP_INIT;
>> > 
>> > 	/* Init rctrl based on our settings */
>> > 	gfar_write(&regs->rctrl, rctrl);
>> >
>> >
>> 
>> If you really need that to be done, you should remove NETIF_F_HW_VLAN_TX
>> from features (not hw_features) (never add it).
>> 
>The only thing I did is to let the vlan insertion disabled by default.
>If someone wants it, it may be enabled via ethtool.
>
>

^ permalink raw reply

* Re: [PATCH] vfs: dont chain pipe/anon/socket on superblock s_inodes list
From: Christoph Hellwig @ 2011-07-26 11:49 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Christoph Hellwig, Tim Chen, Al Viro, David Miller, Andi Kleen,
	Matthew Wilcox, Anton Blanchard, npiggin, linux-kernel,
	linux-fsdevel, netdev
In-Reply-To: <1311677013.2355.25.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>

On Tue, Jul 26, 2011 at 12:43:33PM +0200, Eric Dumazet wrote:
> BTW, we have one atomic op that could be avoided in new_inode()
> 
> spin_lock(&inode->i_lock);
> inode->i_state = 0;
> spin_unlock(&inode->i_lock);
> 
> can probably be changed to something less expensive...
> 
> inode->i_state = 0;
> smp_wmb();
> 
> Not clear if we really need a memory barrier either....

I think we already had this in some of the earlier vfs/inode scale
series, but it got lost when Al asked to just put the fundamental
changes in.

For plain new_inode() the barrier shouldn't be needed as we take
the sb list lock just a little later.  I'm not sure about your new
variant, so I'll rather lave that to you.

There's a few other things missing from earlier iterations, most notable
the non-atomic i_count, and the bucket locks for the inode hash, if
you're eager enough to look into that area.

^ permalink raw reply

* Re: [PATCH] net: Fix security_socket_sendmsg() bypass problem.
From: Tetsuo Handa @ 2011-07-26 11:21 UTC (permalink / raw)
  To: anton; +Cc: casey, mjt, davem, netdev, linux-security-module
In-Reply-To: <20110726195557.5abcf96d@kryten>

Anton Blanchard wrote:
> Not sure what happened to your email but the gains are evident at
> just 2 packets.

I know your benchmark result, but that result is based on calling
security_socket_sendmsg() only once.

What we worry is that overhead by security_socket_sendmsg() kills the
performance gain for batched case.

> I can help with testing - the commit included a microbenchmark for
> the purposes of analysing its performance.

Yes please. The patch in msg11510.html would want some more discussion, but
the patch in msg11504.html is ready to benchmark on your environment.

Does SELinux want to receive nosec flag at selinux_socket_sendmsg() because
calling security_socket_sendmsg() more than once is meaningless for SELinux?

^ permalink raw reply

* Re: [PATCH] gianfar: fix bug caused by 87c288c6e9aa31720b72e2bc2d665e24e1653c3e
From: Sebastian Pöhn @ 2011-07-26 11:13 UTC (permalink / raw)
  To: Jiri Pirko, Linux Netdev
In-Reply-To: <20110726104559.GA2078@minipsycho.brq.redhat.com>

Am Dienstag, den 26.07.2011, 12:46 +0200 schrieb Jiri Pirko:
> Tue, Jul 26, 2011 at 12:03:13PM CEST, sebastian.belden@googlemail.com wrote:
> >commit 87c288c6e9aa31720b72e2bc2d665e24e1653c3e "gianfar: do vlan cleanup" has two issues:
> ># permutation of rx and tx flags
> ># enabling vlan tag insertion by default (this leads to unusable connections on some configurations)
> 
> How so? What's causing that?
If you enable the VLINS bit of txctrl and do not alter the vlan tag
configuration of the NIC, every packet will get a all zero vlan tag
(0x8100 0000). If you run a network without vlan awareness all packets
will be ignored.

In my configuration gianfar system <-> 3c59x system the 3com system
discards all packets received with the vlan tag.
> 
> >
> >If VLAN insertion is requested (via ethtool) it will be set at an other point ...
> >
> >Signed-off-by: Sebastian Poehn <sebastian.poehn@belden.com>
> >---
> >
> > drivers/net/gianfar.c |    6 +-----
> > 1 files changed, 1 insertions(+), 5 deletions(-)
> >
> >diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
> >index 835cd25..2659daa 100644
> >--- a/drivers/net/gianfar.c
> >+++ b/drivers/net/gianfar.c
> >@@ -388,12 +388,8 @@ static void gfar_init_mac(struct net_device *ndev)
> > 	if (priv->hwts_rx_en)
> > 		rctrl |= RCTRL_PRSDEP_INIT | RCTRL_TS_ENABLE;
> > 
> >-	/* keep vlan related bits if it's enabled */
> >-	if (ndev->features & NETIF_F_HW_VLAN_TX)
> >-		rctrl |= RCTRL_VLEX | RCTRL_PRSDEP_INIT;
> >-
> > 	if (ndev->features & NETIF_F_HW_VLAN_RX)
> >-		tctrl |= TCTRL_VLINS;
> >+		rctrl |= RCTRL_VLEX | RCTRL_PRSDEP_INIT;
> > 
> > 	/* Init rctrl based on our settings */
> > 	gfar_write(&regs->rctrl, rctrl);
> >
> >
> 
> If you really need that to be done, you should remove NETIF_F_HW_VLAN_TX
> from features (not hw_features) (never add it).
> 
The only thing I did is to let the vlan insertion disabled by default.
If someone wants it, it may be enabled via ethtool.



^ permalink raw reply

* Re: [PATCH] cxgb3i: ref count cdev access to prevent modification while in use
From: Neil Horman @ 2011-07-26 10:50 UTC (permalink / raw)
  To: Divy Le Ray; +Cc: netdev, Steve Wise, David S. Miller, Karen Xie
In-Reply-To: <4E2DDDC5.8040405@chelsio.com>

On Mon, Jul 25, 2011 at 02:19:01PM -0700, Divy Le Ray wrote:
> On 07/25/2011 12:56 PM, Neil Horman wrote:
> >
> >This oops was reported recently:
> >d:mon> e
> >cpu 0xd: Vector: 300 (Data Access) at [c0000000fd4c7120]
> >    pc: d00000000076f194: .t3_l2t_get+0x44/0x524 [cxgb3]
> >    lr: d000000000b02108: .init_act_open+0x150/0x3d4 [cxgb3i]
> >    sp: c0000000fd4c73a0
> >   msr: 8000000000009032
> >   dar: 0
> > dsisr: 40000000
> >  current = 0xc0000000fd640d40
> >  paca    = 0xc00000000054ff80
> >    pid   = 5085, comm = iscsid
> >d:mon> t
> >[c0000000fd4c7450] d000000000b02108 .init_act_open+0x150/0x3d4 [cxgb3i]
> >[c0000000fd4c7500] d000000000e45378 .cxgbi_ep_connect+0x784/0x8e8
> >[libcxgbi]
> >[c0000000fd4c7650] d000000000db33f0 .iscsi_if_rx+0x71c/0xb18
> >[scsi_transport_iscsi2]
> >[c0000000fd4c7740] c000000000370c9c .netlink_data_ready+0x40/0xa4
> >[c0000000fd4c77c0] c00000000036f010 .netlink_sendskb+0x4c/0x9c
> >[c0000000fd4c7850] c000000000370c18 .netlink_sendmsg+0x358/0x39c
> >[c0000000fd4c7950] c00000000033be24 .sock_sendmsg+0x114/0x1b8
> >[c0000000fd4c7b50] c00000000033d208 .sys_sendmsg+0x218/0x2ac
> >[c0000000fd4c7d70] c00000000033f55c .sys_socketcall+0x228/0x27c
> >[c0000000fd4c7e30] c0000000000086a4 syscall_exit+0x0/0x40
> >--- Exception: c01 (System Call) at 00000080da560cfc
> >
> >The root cause was an EEH error, which sent us down the
> >offload_close path in
> >the cxgb3 driver, which in turn sets cdev->lldev to NULL, without
> >regard for
> >upper layer driver (like the cxgbi drivers) which might have
> >execution contexts
> >in the middle of its use. The result is the oops above, when
> >t3_l2t_get attempts
> >to dereference cdev->lldev right after the EEH error handler sets
> >it to NULL.
> >
> >The fix is to reference count the cdev structure.  When an EEH
> >error occurs, the
> >shutdown path:
> >t3_adapter_error->offload_close->cxgb3i_remove_clients->cxgb3i_dev_close
> >will now block until such time as the cdev pointer has a use count
> >of zero.
> >This coupled with the fact that lookups will now skip finding any
> >registered
> >cdev's in cxgbi_device_find_by_[lldev|netdev] with the
> >CXGBI_FLAG_ADAPTER_RESET
> >bit set ensures that on an EEH, the setting of lldev to NULL in
> >offload_close
> >will only happen after there are no longer any active users of the data
> >structure.
> >
> >This has been tested by the reporter and shown to fix the reproted oops
> >
> >Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> >CC: Divy Le Ray <divy@chelsio.com>
> >CC: Steve Wise <swise@chelsio.com>
> >CC: "David S. Miller" <davem@davemloft.net>
> >
> 
> Also cc-ing Karen.
> 
Thank you Divy. Karen, if you're going to be working on cxgb3i, would you mind
updating the MAINTINERS file so I and others don't forget to CC you in the
future?

Thanks!
Neil


^ permalink raw reply

* Re: [PATCH] gianfar: fix bug caused by 87c288c6e9aa31720b72e2bc2d665e24e1653c3e
From: Jiri Pirko @ 2011-07-26 10:46 UTC (permalink / raw)
  To: Sebastian Pöhn; +Cc: Linux Netdev
In-Reply-To: <1311674593.17190.7.camel@DENEC1DT0191>

Tue, Jul 26, 2011 at 12:03:13PM CEST, sebastian.belden@googlemail.com wrote:
>commit 87c288c6e9aa31720b72e2bc2d665e24e1653c3e "gianfar: do vlan cleanup" has two issues:
># permutation of rx and tx flags
># enabling vlan tag insertion by default (this leads to unusable connections on some configurations)

How so? What's causing that?

>
>If VLAN insertion is requested (via ethtool) it will be set at an other point ...
>
>Signed-off-by: Sebastian Poehn <sebastian.poehn@belden.com>
>---
>
> drivers/net/gianfar.c |    6 +-----
> 1 files changed, 1 insertions(+), 5 deletions(-)
>
>diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
>index 835cd25..2659daa 100644
>--- a/drivers/net/gianfar.c
>+++ b/drivers/net/gianfar.c
>@@ -388,12 +388,8 @@ static void gfar_init_mac(struct net_device *ndev)
> 	if (priv->hwts_rx_en)
> 		rctrl |= RCTRL_PRSDEP_INIT | RCTRL_TS_ENABLE;
> 
>-	/* keep vlan related bits if it's enabled */
>-	if (ndev->features & NETIF_F_HW_VLAN_TX)
>-		rctrl |= RCTRL_VLEX | RCTRL_PRSDEP_INIT;
>-
> 	if (ndev->features & NETIF_F_HW_VLAN_RX)
>-		tctrl |= TCTRL_VLINS;
>+		rctrl |= RCTRL_VLEX | RCTRL_PRSDEP_INIT;
> 
> 	/* Init rctrl based on our settings */
> 	gfar_write(&regs->rctrl, rctrl);
>
>

If you really need that to be done, you should remove NETIF_F_HW_VLAN_TX
from features (not hw_features) (never add it).


^ permalink raw reply

* Re: [PATCH] vfs: dont chain pipe/anon/socket on superblock s_inodes list
From: Eric Dumazet @ 2011-07-26 10:43 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Tim Chen, Al Viro, David Miller, Andi Kleen, Matthew Wilcox,
	Anton Blanchard, npiggin, linux-kernel, linux-fsdevel, netdev
In-Reply-To: <20110726094204.GA12086@infradead.org>

Le mardi 26 juillet 2011 à 05:42 -0400, Christoph Hellwig a écrit :
> On Tue, Jul 26, 2011 at 11:36:34AM +0200, Eric Dumazet wrote:
> > [PATCH v2] vfs: dont chain pipe/anon/socket on superblock s_inodes list
> > 
> > Workloads using pipes and sockets hit inode_sb_list_lock contention.
> > 
> > superblock s_inodes list is needed for quota, dirty, pagecache and
> > fsnotify management. pipe/anon/socket fs are clearly not candidates for
> > these.
> > 
> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> 
> Looks good to me,
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> 

Thanks !

BTW, we have one atomic op that could be avoided in new_inode()

spin_lock(&inode->i_lock);
inode->i_state = 0;
spin_unlock(&inode->i_lock);

can probably be changed to something less expensive...

inode->i_state = 0;
smp_wmb();

Not clear if we really need a memory barrier either....



--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH] gianfar: fix bug caused by 87c288c6e9aa31720b72e2bc2d665e24e1653c3e
From: Sebastian Pöhn @ 2011-07-26 10:03 UTC (permalink / raw)
  To: Linux Netdev, jpirko

commit 87c288c6e9aa31720b72e2bc2d665e24e1653c3e "gianfar: do vlan cleanup" has two issues:
# permutation of rx and tx flags
# enabling vlan tag insertion by default (this leads to unusable connections on some configurations)

If VLAN insertion is requested (via ethtool) it will be set at an other point ...

Signed-off-by: Sebastian Poehn <sebastian.poehn@belden.com>
---

 drivers/net/gianfar.c |    6 +-----
 1 files changed, 1 insertions(+), 5 deletions(-)

diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
index 835cd25..2659daa 100644
--- a/drivers/net/gianfar.c
+++ b/drivers/net/gianfar.c
@@ -388,12 +388,8 @@ static void gfar_init_mac(struct net_device *ndev)
 	if (priv->hwts_rx_en)
 		rctrl |= RCTRL_PRSDEP_INIT | RCTRL_TS_ENABLE;
 
-	/* keep vlan related bits if it's enabled */
-	if (ndev->features & NETIF_F_HW_VLAN_TX)
-		rctrl |= RCTRL_VLEX | RCTRL_PRSDEP_INIT;
-
 	if (ndev->features & NETIF_F_HW_VLAN_RX)
-		tctrl |= TCTRL_VLINS;
+		rctrl |= RCTRL_VLEX | RCTRL_PRSDEP_INIT;
 
 	/* Init rctrl based on our settings */
 	gfar_write(&regs->rctrl, rctrl);



^ permalink raw reply related

* Re: [PATCH] net: Fix security_socket_sendmsg() bypass problem.
From: Anton Blanchard @ 2011-07-26  9:55 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: casey, mjt, davem, netdev, linux-security-module
In-Reply-To: <201107260143.CGH18263.FOOSVMOQFJFLHt@I-love.SAKURA.ne.jp>


Hi,

> I fear it too. Unless many dozens (maybe some hundreds) of packets
> are sent by sendmmsg(), msg11504.html might show better performance
> than msg11510.html . But I don't have a machine to benchmark.

Not sure what happened to your email but the gains are evident at
just 2 packets.

I can help with testing - the commit included a microbenchmark for
the purposes of analysing its performance.

Anton

--

    net: Add sendmmsg socket system call
    
    This patch adds a multiple message send syscall and is the send
    version of the existing recvmmsg syscall. This is heavily
    based on the patch by Arnaldo that added recvmmsg.
    
    I wrote a microbenchmark to test the performance gains of using
    this new syscall:
    
    http://ozlabs.org/~anton/junkcode/sendmmsg_test.c
    
    The test was run on a ppc64 box with a 10 Gbit network card. The
    benchmark can send both UDP and RAW ethernet packets.
    
    64B UDP
    
    batch   pkts/sec
    1       804570
    2       872800 (+ 8 %)
    4       916556 (+14 %)
    8       939712 (+17 %)
    16      952688 (+18 %)
    32      956448 (+19 %)
    64      964800 (+20 %)
    
    64B raw socket
    
    batch   pkts/sec
    1       1201449
    2       1350028 (+12 %)
    4       1461416 (+22 %)
    8       1513080 (+26 %)
    16      1541216 (+28 %)
    32      1553440 (+29 %)
    64      1557888 (+30 %)
    
    We see a 20% improvement in throughput on UDP send and 30%
    on raw socket send.
    
    [ Add sparc syscall entries. -DaveM ]

^ permalink raw reply

* Re: [PATCH] vfs: dont chain pipe/anon/socket on superblock s_inodes list
From: Christoph Hellwig @ 2011-07-26  9:42 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Christoph Hellwig, Tim Chen, Al Viro, David Miller, Andi Kleen,
	Matthew Wilcox, Anton Blanchard, npiggin, linux-kernel,
	linux-fsdevel, netdev
In-Reply-To: <1311672994.2355.17.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>

On Tue, Jul 26, 2011 at 11:36:34AM +0200, Eric Dumazet wrote:
> [PATCH v2] vfs: dont chain pipe/anon/socket on superblock s_inodes list
> 
> Workloads using pipes and sockets hit inode_sb_list_lock contention.
> 
> superblock s_inodes list is needed for quota, dirty, pagecache and
> fsnotify management. pipe/anon/socket fs are clearly not candidates for
> these.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Looks good to me,

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply

* Re: [PATCH] vfs: dont chain pipe/anon/socket on superblock s_inodes list
From: Eric Dumazet @ 2011-07-26  9:36 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Tim Chen, Al Viro, David Miller, Andi Kleen, Matthew Wilcox,
	Anton Blanchard, npiggin, linux-kernel, linux-fsdevel, netdev
In-Reply-To: <20110726090357.GA13013@infradead.org>

Le mardi 26 juillet 2011 à 05:03 -0400, Christoph Hellwig a écrit :
> On Tue, Jul 26, 2011 at 10:21:06AM +0200, Eric Dumazet wrote:
> > Well, not 'last' contention point, as we still hit remove_inode_hash(),
> 
> There should be no ned to put pipe or anon inodes on the inode hash.
> Probably sockets don't need it either, but I'd need to look at it in
> detail.
> 
> > inode_wb_list_del()
> 
> The should never be on the wb list either, doing an unlocked check for
> actually beeing on the list before taking the lock should help you.

Yes, it might even help regular inodes ;)

> 
> > inode_lru_list_del(),
> 
> No real need to keep inodes in the LRU if we only allocate them using
> new_inode but never look them up either.  You might want to try setting
> .drop_inode to generic_delete_inode for these.

Yes, I'll take a look, thanks.

> 
> > +struct inode *__new_inode(struct super_block *sb)
> > +{
> > +	struct inode *inode = alloc_inode(sb);
> > +
> > +	if (inode) {
> > +		spin_lock(&inode->i_lock);
> > +		inode->i_state = 0;
> > +		spin_unlock(&inode->i_lock);
> > +		INIT_LIST_HEAD(&inode->i_sb_list);
> > +	}
> > +	return inode;
> > +}
> 
> This needs a much better name like new_inode_pseudo, and a kerneldoc 
> comment explaining when it is safe to use, and the consequences, which
> appear to me:
> 
>  - fs may never be unmount
>  - quotas can't work on the filesystem
>  - writeback can't work on the filesystem

Thanks for reviewing, here is v2 of the patch, addressing your comments.


[PATCH v2] vfs: dont chain pipe/anon/socket on superblock s_inodes list

Workloads using pipes and sockets hit inode_sb_list_lock contention.

superblock s_inodes list is needed for quota, dirty, pagecache and
fsnotify management. pipe/anon/socket fs are clearly not candidates for
these.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
v2: address Christoph comments

 fs/anon_inodes.c   |    2 +-
 fs/inode.c         |   39 ++++++++++++++++++++++++++++++---------
 fs/pipe.c          |    2 +-
 include/linux/fs.h |    3 ++-
 net/socket.c       |    2 +-
 5 files changed, 35 insertions(+), 13 deletions(-)

diff --git a/fs/anon_inodes.c b/fs/anon_inodes.c
index 4d433d3..f11e43e 100644
--- a/fs/anon_inodes.c
+++ b/fs/anon_inodes.c
@@ -187,7 +187,7 @@ EXPORT_SYMBOL_GPL(anon_inode_getfd);
  */
 static struct inode *anon_inode_mkinode(void)
 {
-	struct inode *inode = new_inode(anon_inode_mnt->mnt_sb);
+	struct inode *inode = new_inode_pseudo(anon_inode_mnt->mnt_sb);
 
 	if (!inode)
 		return ERR_PTR(-ENOMEM);
diff --git a/fs/inode.c b/fs/inode.c
index 96c77b8..319b93b 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -362,9 +362,11 @@ EXPORT_SYMBOL_GPL(inode_sb_list_add);
 
 static inline void inode_sb_list_del(struct inode *inode)
 {
-	spin_lock(&inode_sb_list_lock);
-	list_del_init(&inode->i_sb_list);
-	spin_unlock(&inode_sb_list_lock);
+	if (!list_empty(&inode->i_sb_list)) {
+		spin_lock(&inode_sb_list_lock);
+		list_del_init(&inode->i_sb_list);
+		spin_unlock(&inode_sb_list_lock);
+	}
 }
 
 static unsigned long hash(struct super_block *sb, unsigned long hashval)
@@ -797,6 +799,29 @@ unsigned int get_next_ino(void)
 EXPORT_SYMBOL(get_next_ino);
 
 /**
+ *	new_inode_pseudo 	- obtain an inode
+ *	@sb: superblock
+ *
+ *	Allocates a new inode for given superblock.
+ *	Inode wont be chained in superblock s_inodes list
+ *	This means :
+ *	- fs can't be unmount
+ *	- quotas, fsnotify, writeback can't work
+ */
+struct inode *new_inode_pseudo(struct super_block *sb)
+{
+	struct inode *inode = alloc_inode(sb);
+
+	if (inode) {
+		spin_lock(&inode->i_lock);
+		inode->i_state = 0;
+		spin_unlock(&inode->i_lock);
+		INIT_LIST_HEAD(&inode->i_sb_list);
+	}
+	return inode;
+}
+
+/**
  *	new_inode 	- obtain an inode
  *	@sb: superblock
  *
@@ -814,13 +839,9 @@ struct inode *new_inode(struct super_block *sb)
 
 	spin_lock_prefetch(&inode_sb_list_lock);
 
-	inode = alloc_inode(sb);
-	if (inode) {
-		spin_lock(&inode->i_lock);
-		inode->i_state = 0;
-		spin_unlock(&inode->i_lock);
+	inode = new_inode_pseudo(sb);
+	if (inode)
 		inode_sb_list_add(inode);
-	}
 	return inode;
 }
 EXPORT_SYMBOL(new_inode);
diff --git a/fs/pipe.c b/fs/pipe.c
index 1b7f9af..0e0be1d 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -948,7 +948,7 @@ static const struct dentry_operations pipefs_dentry_operations = {
 
 static struct inode * get_pipe_inode(void)
 {
-	struct inode *inode = new_inode(pipe_mnt->mnt_sb);
+	struct inode *inode = new_inode_pseudo(pipe_mnt->mnt_sb);
 	struct pipe_inode_info *pipe;
 
 	if (!inode)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index a665804..cc363fa 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2310,7 +2310,8 @@ extern void __iget(struct inode * inode);
 extern void iget_failed(struct inode *);
 extern void end_writeback(struct inode *);
 extern void __destroy_inode(struct inode *);
-extern struct inode *new_inode(struct super_block *);
+extern struct inode *new_inode_pseudo(struct super_block *sb);
+extern struct inode *new_inode(struct super_block *sb);
 extern void free_inode_nonrcu(struct inode *inode);
 extern int should_remove_suid(struct dentry *);
 extern int file_remove_suid(struct file *);
diff --git a/net/socket.c b/net/socket.c
index 02dc82d..26ed35c 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -467,7 +467,7 @@ static struct socket *sock_alloc(void)
 	struct inode *inode;
 	struct socket *sock;
 
-	inode = new_inode(sock_mnt->mnt_sb);
+	inode = new_inode_pseudo(sock_mnt->mnt_sb);
 	if (!inode)
 		return NULL;
 



^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox