Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: unregister_netdevice: waiting for lo to become free. Usage count = 8
From: Julian Anastasov @ 2011-04-15 20:11 UTC (permalink / raw)
  To: Hans Schillstrom; +Cc: Simon Horman, netdev, lvs-devel, Eric W. Biederman
In-Reply-To: <201104150901.47214.hans@schillstrom.com>


	Hello,

On Fri, 15 Apr 2011, Hans Schillstrom wrote:

> Hello Julian
> 
> I'm trying to fix the cleanup process when a namespace get "killed",
> which is a new feature for ipvs. However an old problem appears again
> 
> When there has been traffic trough ipvs where the destination is unreachable
> the usage count on loopback dev increases one for every packet....

	What is the kernel version?

> I guess thats because of this rule :
> 
> # ip route list table all
> ...
> unreachable default dev lo  table 0  proto kernel  metric 4294967295  error -101 hoplimit 25
> ...
> 
> I made a test just forwarding packets through the same container (ipvs loaded)
> to an unreachable destination and that test had a balanced count i.e. it was possible to reboot the container.

	Can you explain, what do you mean with unreachable
destination? Are you adding some rejecting route?

> Do you have an idea why  this happens in the ipvs case ?

	Do you see with debug level 3 the "Removing destination"
messages. Only real servers can hold dest->dst_cache reference
for dev which can be a problem because the real servers are not
deleted immediately - on traffic they are moved to trash
list. But ip_vs_trash_cleanup() should remove any left
structures. You should check in debug that all servers are
deleted. If all real server structures are freed but
problem remains we should look more deeply in the
dest->dst_cache usage. DR or NAT is used?

	I assume cleanup really happens in this order:

ip_vs_cleanup():
	nf_unregister_hooks()
	...
	ip_vs_conn_cleanup()
	...
	ip_vs_control_cleanup()

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply

* Re: Feature request: "inverted" ping -a (beep on failure)
From: Randy Dunlap @ 2011-04-15 19:49 UTC (permalink / raw)
  To: Christian Boltz; +Cc: netdev
In-Reply-To: <201104152135.33171@tux.boltz.de.vu>

On Fri, 15 Apr 2011 21:35:32 +0200 Christian Boltz wrote:

> Hello,
> 
> ping -a (beep on ping success) is a quite useful command, but it can be 
> annoying.
> 
> I'd like to have the exact opposite of it: beep when pinging fails.
> 
> I understand that this is slightly difficult because "ping success" is 
> easier to detect (incoming package) than "ping failure" (no incoming 
> package or firewall reject) - my proposal is to have a timeout for every 
> package (if no reply package comes in) and beep if no reply is seen 
> after the timeout is over.
> 
> For the timeout, the -W option could be used. The default timeout seems 
> to be 10 seconds, which is OK.
> 
> Usecase / why this would be useful for me:
> Basically for server monitoring. The exact usecase is that I have rented 
> a "root server" and asked the hoster to exchange a broken harddisk.
> With the "inverted" ping -a, it would be easy to notice when they switch 
> off the server to replace the disk.
> 
> Please consider this feature for the next version of ping ;-)
> 
> 
> (The iputils homepage does not list any bugtracker or similar, therefore 
> I'm asking here.)

Couldn't you look for exit code (status) 1 and then do a bell/beep
(or play a sound file :)?

Or do you want ping to beep and then continue running?

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

^ permalink raw reply

* Feature request: "inverted" ping -a (beep on failure)
From: Christian Boltz @ 2011-04-15 19:35 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA

Hello,

ping -a (beep on ping success) is a quite useful command, but it can be 
annoying.

I'd like to have the exact opposite of it: beep when pinging fails.

I understand that this is slightly difficult because "ping success" is 
easier to detect (incoming package) than "ping failure" (no incoming 
package or firewall reject) - my proposal is to have a timeout for every 
package (if no reply package comes in) and beep if no reply is seen 
after the timeout is over.

For the timeout, the -W option could be used. The default timeout seems 
to be 10 seconds, which is OK.

Usecase / why this would be useful for me:
Basically for server monitoring. The exact usecase is that I have rented 
a "root server" and asked the hoster to exchange a broken harddisk.
With the "inverted" ping -a, it would be easy to notice when they switch 
off the server to replace the disk.

Please consider this feature for the next version of ping ;-)

(The iputils homepage does not list any bugtracker or similar, therefore 
I'm asking here.)

Gruß

Christian Boltz
-- 
"we will support any library from any repo combined with
any application" is something that NO ONE does.
Or if they do, they are insane, or lying, or both.
[Greg KH in opensuse-factory]
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [RFC net-next] bonding: notify when bonding device address changes
From: Jay Vosburgh @ 2011-04-15 19:28 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Nicolas de Pesloüan, Michał Górny, netdev, roy,
	Andy Gospodarek
In-Reply-To: <20110415121054.73717900@nehalam>

Stephen Hemminger <shemminger@vyatta.com> wrote:

>When a device changes its hardware address, it needs to call the network
>device notifiers to inform protocols.
>
>Compile tested only.

	We'll need to test this, I think.  If I'm not mistaken, I
believe that inetdev_event will issue gratuitous ARPs when it gets the
NETDEV_CHANGEADDR, and we need to make sure those are correct for all
cases.

>Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
>
>--- a/drivers/net/bonding/bond_main.c	2011-04-15 11:21:02.142866195 -0700
>+++ b/drivers/net/bonding/bond_main.c	2011-04-15 11:28:06.491408825 -0700
>@@ -967,9 +967,11 @@ static void bond_do_fail_over_mac(struct
>
> 	switch (bond->params.fail_over_mac) {
> 	case BOND_FOM_ACTIVE:
>-		if (new_active)
>+		if (new_active) {
> 			memcpy(bond->dev->dev_addr,  new_active->dev->dev_addr,
> 			       new_active->dev->addr_len);
>+			call_netdevice_notifiers(NETDEV_CHANGEADDR, bond->dev);
>+		}
> 		break;
> 	case BOND_FOM_FOLLOW:
> 		/*
>@@ -1386,6 +1388,7 @@ static int bond_sethwaddr(struct net_dev
> 	pr_debug("slave_dev=%p\n", slave_dev);
> 	pr_debug("slave_dev->addr_len=%d\n", slave_dev->addr_len);
> 	memcpy(bond_dev->dev_addr, slave_dev->dev_addr, slave_dev->addr_len);
>+	call_netdevice_notifiers(NETDEV_CHANGEADDR, bond_dev);
> 	return 0;
> }
>
>@@ -1644,10 +1647,11 @@ int bond_enslave(struct net_device *bond
>
> 	/* If this is the first slave, then we need to set the master's hardware
> 	 * address to be the same as the slave's. */
>-	if (is_zero_ether_addr(bond->dev->dev_addr))
>+	if (is_zero_ether_addr(bond->dev->dev_addr)) {
> 		memcpy(bond->dev->dev_addr, slave_dev->dev_addr,
> 		       slave_dev->addr_len);
>-
>+		call_netdevice_notifiers(NETDEV_CHANGEADDR, bond->dev);
>+	}
>
> 	new_slave = kzalloc(sizeof(struct slave), GFP_KERNEL);
> 	if (!new_slave) {
>@@ -2067,6 +2071,7 @@ int bond_release(struct net_device *bond
> 		 * to the mac address of the first slave
> 		 */
> 		memset(bond_dev->dev_addr, 0, bond_dev->addr_len);
>+		call_netdevice_notifiers(NETDEV_CHANGEADDR, bond_dev);

	This one in particular I'm not sure about; should the system
send a gratuitous ARP for a MAC address of all zeroes?

> 		if (!bond->vlgrp) {
> 			bond_dev->features |= NETIF_F_VLAN_CHALLENGED;
>@@ -2252,6 +2257,7 @@ static int bond_release_all(struct net_d
> 	 * first slave
> 	 */
> 	memset(bond_dev->dev_addr, 0, bond_dev->addr_len);
>+	call_netdevice_notifiers(NETDEV_CHANGEADDR, bond_dev);

	Same comment for this one.

> 	if (!bond->vlgrp) {
> 		bond_dev->features |= NETIF_F_VLAN_CHALLENGED;

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply

* Re: The bonding driver should notify userspace of MAC address change
From: Jay Vosburgh @ 2011-04-15 19:22 UTC (permalink / raw)
  To: Phil Oester
  Cc: =?UTF-8?B?Tmljb2xhcyBkZSBQZXNsb8O8YW4=?=,
	=?UTF-8?B?TWljaGHFgiBHw7Nybnk=?=, netdev, roy, Andy Gospodarek
In-Reply-To: <20110415191249.GA6879@linuxace.com>

Phil Oester <kernel@linuxace.com> wrote:

>On Fri, Apr 15, 2011 at 11:53:10AM -0700, Jay Vosburgh wrote:
>> >A bonding device should not report link up to userspace until at least one slave is present and up.
>> >
>> >And possibly, a bonding device should report link down if all slaves are down or all slave were removed.
>> >
>> >Jay, Andy, does this sounds sensible to you?
>> 
>> 	I was just reading their bug and doing an experiment; I don't
>> see that bonding reports carrier up until there's at least one slave
>> (even if it's configured up), e.g.,
>> 
>
>This was only recently fixed - see e826eafa65c6f1f7c8db5a237556cebac57ebcc5
>(bonding: Call netif_carrier_off after register_netdevice). Perhaps the
>reporter is not using a recent kernel?

	Yah, I looked that up after I'd sent my prior email.  Since that
change went in only a month ago, maybe they don't have it.

	On the other hand, I did my test on a FC 14 kernel,
2.6.35.6-45.fc14, which claims to have been built in October 2010, and
it seemed to behave properly.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply

* Re: The bonding driver should notify userspace of MAC address change
From: Nicolas de Pesloüan @ 2011-04-15 19:22 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: Michał Górny, netdev, roy, Andy Gospodarek
In-Reply-To: <10227.1302893590@death>

Le 15/04/2011 20:53, Jay Vosburgh a écrit :
> Nicolas de Pesloüan 	<nicolas.2p.debian@gmail.com>  wrote:
>
>> Le 15/04/2011 18:44, Michał Górny a écrit :
>>> Hello,
>>>
>>> I'd like to file a feature request for the bonding driver. Currently,
>>> there is no way for userspace to know whether the driver actually gets
>>> a MAC address. This results in the fact that dhcpcd sends MAC-less DHCP
>>> packets through bonding device if it is started before bond gets any
>>> slaves.
>>
>> A similar subject, involving bridge instead of bonding, was discussed a
>> few weeks ago in this thread:
>> http://marc.info/?l=linux-netdev&m=129939017116310&w=2
>>
>> In particular, I suggested to apply Stephen's suggestion not only to bridge but also to bonding.
>>
>> (http://marc.info/?l=linux-netdev&m=129948385024680&w=2)
>>
>> A bonding device should not report link up to userspace until at least one slave is present and up.
>>
>> And possibly, a bonding device should report link down if all slaves are down or all slave were removed.
>>
>> Jay, Andy, does this sounds sensible to you?
>
> 	I was just reading their bug and doing an experiment; I don't
> see that bonding reports carrier up until there's at least one slave
> (even if it's configured up), e.g.,
>
> # modprobe bonding
> # ifconfig bond0 up
> # cat /sys/class/net/bond0/carrier
> 0
> # echo +eth5>  /sys/class/net/bond0/bonding/slaves
> # cat /sys/class/net/bond0/carrier
> 1
>
> 	If there's a slave, there's a MAC assigned, since bond_enslave
> sets the master's MAC before it calls bond_set_carrier.
>
> 	In bond_create, as soon as register_netdevice returns, we call
> netif_carrier_off, and it stays off until bond_enslave runs
> successfully.

Agreed.

> 	Is there some race window there between the register and the
> netif_carrier_off?

It might be that dhcpd does not wait for link to be up before starting to send DHCP requests.

	Nicolas.

^ permalink raw reply

* Re: The bonding driver should notify userspace of MAC address change
From: Phil Oester @ 2011-04-15 19:12 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: =?UTF-8?B?Tmljb2xhcyBkZSBQZXNsb8O8YW4=?=,
	=?UTF-8?B?TWljaGHFgiBHw7Nybnk=?=, netdev, roy, Andy Gospodarek
In-Reply-To: <10227.1302893590@death>

On Fri, Apr 15, 2011 at 11:53:10AM -0700, Jay Vosburgh wrote:
> >A bonding device should not report link up to userspace until at least one slave is present and up.
> >
> >And possibly, a bonding device should report link down if all slaves are down or all slave were removed.
> >
> >Jay, Andy, does this sounds sensible to you?
> 
> 	I was just reading their bug and doing an experiment; I don't
> see that bonding reports carrier up until there's at least one slave
> (even if it's configured up), e.g.,
> 

This was only recently fixed - see e826eafa65c6f1f7c8db5a237556cebac57ebcc5
(bonding: Call netif_carrier_off after register_netdevice). Perhaps the
reporter is not using a recent kernel?

Phil 

^ permalink raw reply

* [RFC net-next] bonding: notify when bonding device address changes
From: Stephen Hemminger @ 2011-04-15 19:10 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: Nicolas de Pesloüan, Michał Górny, netdev, roy,
	Andy Gospodarek
In-Reply-To: <10227.1302893590@death>

When a device changes its hardware address, it needs to call the network
device notifiers to inform protocols.

Compile tested only.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

--- a/drivers/net/bonding/bond_main.c	2011-04-15 11:21:02.142866195 -0700
+++ b/drivers/net/bonding/bond_main.c	2011-04-15 11:28:06.491408825 -0700
@@ -967,9 +967,11 @@ static void bond_do_fail_over_mac(struct
 
 	switch (bond->params.fail_over_mac) {
 	case BOND_FOM_ACTIVE:
-		if (new_active)
+		if (new_active) {
 			memcpy(bond->dev->dev_addr,  new_active->dev->dev_addr,
 			       new_active->dev->addr_len);
+			call_netdevice_notifiers(NETDEV_CHANGEADDR, bond->dev);
+		}
 		break;
 	case BOND_FOM_FOLLOW:
 		/*
@@ -1386,6 +1388,7 @@ static int bond_sethwaddr(struct net_dev
 	pr_debug("slave_dev=%p\n", slave_dev);
 	pr_debug("slave_dev->addr_len=%d\n", slave_dev->addr_len);
 	memcpy(bond_dev->dev_addr, slave_dev->dev_addr, slave_dev->addr_len);
+	call_netdevice_notifiers(NETDEV_CHANGEADDR, bond_dev);
 	return 0;
 }
 
@@ -1644,10 +1647,11 @@ int bond_enslave(struct net_device *bond
 
 	/* If this is the first slave, then we need to set the master's hardware
 	 * address to be the same as the slave's. */
-	if (is_zero_ether_addr(bond->dev->dev_addr))
+	if (is_zero_ether_addr(bond->dev->dev_addr)) {
 		memcpy(bond->dev->dev_addr, slave_dev->dev_addr,
 		       slave_dev->addr_len);
-
+		call_netdevice_notifiers(NETDEV_CHANGEADDR, bond->dev);
+	}
 
 	new_slave = kzalloc(sizeof(struct slave), GFP_KERNEL);
 	if (!new_slave) {
@@ -2067,6 +2071,7 @@ int bond_release(struct net_device *bond
 		 * to the mac address of the first slave
 		 */
 		memset(bond_dev->dev_addr, 0, bond_dev->addr_len);
+		call_netdevice_notifiers(NETDEV_CHANGEADDR, bond_dev);
 
 		if (!bond->vlgrp) {
 			bond_dev->features |= NETIF_F_VLAN_CHALLENGED;
@@ -2252,6 +2257,7 @@ static int bond_release_all(struct net_d
 	 * first slave
 	 */
 	memset(bond_dev->dev_addr, 0, bond_dev->addr_len);
+	call_netdevice_notifiers(NETDEV_CHANGEADDR, bond_dev);
 
 	if (!bond->vlgrp) {
 		bond_dev->features |= NETIF_F_VLAN_CHALLENGED;

^ permalink raw reply

* Re: [PATCH] net: myri10ge: convert to hw_features
From: Jon Mason @ 2011-04-15 18:29 UTC (permalink / raw)
  To: Michał Mirosław; +Cc: netdev, Andrew Gallatin, Brice Goglin
In-Reply-To: <20110415145050.0D65C13A66@rere.qmqm.pl>

On Fri, Apr 15, 2011 at 04:50:50PM +0200, Michał Mirosław wrote:
> Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
> ---
>  drivers/net/myri10ge/myri10ge.c |   66 +++++++-------------------------------
>  1 files changed, 12 insertions(+), 54 deletions(-)
> 
> diff --git a/drivers/net/myri10ge/myri10ge.c b/drivers/net/myri10ge/myri10ge.c
> index 1446de5..a48eb92 100644
> --- a/drivers/net/myri10ge/myri10ge.c
> +++ b/drivers/net/myri10ge/myri10ge.c
> @@ -205,7 +205,6 @@ struct myri10ge_priv {
>  	int tx_boundary;	/* boundary transmits cannot cross */
>  	int num_slices;
>  	int running;		/* running?             */
> -	int csum_flag;		/* rx_csums?            */

Get rid of MXGEFW_FLAGS_CKSUM in drivers/net/myri10ge/myri10ge_mcp.h,
as this was the only thing using it.

>  	int small_bytes;
>  	int big_bytes;
>  	int max_intr_slots;
> @@ -1386,7 +1385,7 @@ myri10ge_rx_done(struct myri10ge_slice_state *ss, int len, __wsum csum,
>  	skb->protocol = eth_type_trans(skb, dev);
>  	skb_record_rx_queue(skb, ss - &mgp->ss[0]);
>  
> -	if (mgp->csum_flag) {
> +	if (dev->features & NETIF_F_RXCSUM) {
>  		if ((skb->protocol == htons(ETH_P_IP)) ||
>  		    (skb->protocol == htons(ETH_P_IPV6))) {
>  			skb->csum = csum;
> @@ -1757,43 +1756,6 @@ myri10ge_get_ringparam(struct net_device *netdev,
>  	ring->tx_pending = ring->tx_max_pending;
>  }
>  
> -static u32 myri10ge_get_rx_csum(struct net_device *netdev)
> -{
> -	struct myri10ge_priv *mgp = netdev_priv(netdev);
> -
> -	if (mgp->csum_flag)
> -		return 1;
> -	else
> -		return 0;
> -}
> -
> -static int myri10ge_set_rx_csum(struct net_device *netdev, u32 csum_enabled)
> -{
> -	struct myri10ge_priv *mgp = netdev_priv(netdev);
> -	int err = 0;
> -
> -	if (csum_enabled)
> -		mgp->csum_flag = MXGEFW_FLAGS_CKSUM;
> -	else {
> -		netdev->features &= ~NETIF_F_LRO;
> -		mgp->csum_flag = 0;
> -
> -	}
> -	return err;
> -}
> -
> -static int myri10ge_set_tso(struct net_device *netdev, u32 tso_enabled)
> -{
> -	struct myri10ge_priv *mgp = netdev_priv(netdev);
> -	u32 flags = mgp->features & (NETIF_F_TSO6 | NETIF_F_TSO);
> -
> -	if (tso_enabled)
> -		netdev->features |= flags;
> -	else
> -		netdev->features &= ~flags;
> -	return 0;
> -}

ethtool_op_set_tso does not support TSO6.  This would remove the
enable/disable of that feature.

> -
>  static const char myri10ge_gstrings_main_stats[][ETH_GSTRING_LEN] = {
>  	"rx_packets", "tx_packets", "rx_bytes", "tx_bytes", "rx_errors",
>  	"tx_errors", "rx_dropped", "tx_dropped", "multicast", "collisions",
> @@ -1944,11 +1906,6 @@ static u32 myri10ge_get_msglevel(struct net_device *netdev)
>  	return mgp->msg_enable;
>  }
>  
> -static int myri10ge_set_flags(struct net_device *netdev, u32 value)
> -{
> -	return ethtool_op_set_flags(netdev, value, ETH_FLAG_LRO);
> -}
> -
>  static const struct ethtool_ops myri10ge_ethtool_ops = {
>  	.get_settings = myri10ge_get_settings,
>  	.get_drvinfo = myri10ge_get_drvinfo,
> @@ -1957,19 +1914,12 @@ static const struct ethtool_ops myri10ge_ethtool_ops = {
>  	.get_pauseparam = myri10ge_get_pauseparam,
>  	.set_pauseparam = myri10ge_set_pauseparam,
>  	.get_ringparam = myri10ge_get_ringparam,
> -	.get_rx_csum = myri10ge_get_rx_csum,
> -	.set_rx_csum = myri10ge_set_rx_csum,
> -	.set_tx_csum = ethtool_op_set_tx_hw_csum,
> -	.set_sg = ethtool_op_set_sg,
> -	.set_tso = myri10ge_set_tso,
>  	.get_link = ethtool_op_get_link,
>  	.get_strings = myri10ge_get_strings,
>  	.get_sset_count = myri10ge_get_sset_count,
>  	.get_ethtool_stats = myri10ge_get_ethtool_stats,
>  	.set_msglevel = myri10ge_set_msglevel,
>  	.get_msglevel = myri10ge_get_msglevel,
> -	.get_flags = ethtool_op_get_flags,
> -	.set_flags = myri10ge_set_flags
>  };
>  
>  static int myri10ge_allocate_rings(struct myri10ge_slice_state *ss)
> @@ -3136,6 +3086,14 @@ static int myri10ge_set_mac_address(struct net_device *dev, void *addr)
>  	return 0;
>  }
>  
> +static u32 myri10ge_fix_features(struct net_device *dev, u32 features)
> +{
> +	if (!(features & NETIF_F_RXCSUM))
> +		features &= ~NETIF_F_LRO;
> +
> +	return features;
> +}
> +
>  static int myri10ge_change_mtu(struct net_device *dev, int new_mtu)
>  {
>  	struct myri10ge_priv *mgp = netdev_priv(dev);
> @@ -3834,6 +3792,7 @@ static const struct net_device_ops myri10ge_netdev_ops = {
>  	.ndo_get_stats		= myri10ge_get_stats,
>  	.ndo_validate_addr	= eth_validate_addr,
>  	.ndo_change_mtu		= myri10ge_change_mtu,
> +	.ndo_fix_features	= myri10ge_fix_features,
>  	.ndo_set_multicast_list = myri10ge_set_multicast_list,
>  	.ndo_set_mac_address	= myri10ge_set_mac_address,
>  };
> @@ -3860,7 +3819,6 @@ static int myri10ge_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>  	mgp = netdev_priv(netdev);
>  	mgp->dev = netdev;
>  	mgp->pdev = pdev;
> -	mgp->csum_flag = MXGEFW_FLAGS_CKSUM;
>  	mgp->pause = myri10ge_flow_control;
>  	mgp->intr_coal_delay = myri10ge_intr_coal_delay;
>  	mgp->msg_enable = netif_msg_init(myri10ge_debug, MYRI10GE_MSG_DEFAULT);
> @@ -3976,11 +3934,11 @@ static int myri10ge_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>  	netdev->netdev_ops = &myri10ge_netdev_ops;
>  	netdev->mtu = myri10ge_initial_mtu;
>  	netdev->base_addr = mgp->iomem_base;
> -	netdev->features = mgp->features;
> +	netdev->hw_features = mgp->features | NETIF_F_LRO | NETIF_F_RXCSUM;
> +	netdev->features = netdev->hw_features;
>  
>  	if (dac_enabled)
>  		netdev->features |= NETIF_F_HIGHDMA;
> -	netdev->features |= NETIF_F_LRO;
>  
>  	netdev->vlan_features |= mgp->features;
>  	if (mgp->fw_ver_tiny < 37)
> -- 
> 1.7.2.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* Re: [PATCH] net: cxgb4{,vf}: convert to hw_features
From: Dimitris Michailidis @ 2011-04-15 19:00 UTC (permalink / raw)
  To: Michał Mirosław; +Cc: netdev, Casey Leedom
In-Reply-To: <20110415145050.6A43913A6A@rere.qmqm.pl>

Michał Mirosław wrote:

> +#define TSO_FLAGS (NETIF_F_TSO | NETIF_F_TSO6 | NETIF_F_TSO_ECN)
>  #define VLAN_FEAT (NETIF_F_SG | NETIF_F_IP_CSUM | TSO_FLAGS | \
>  		   NETIF_F_IPV6_CSUM | NETIF_F_HIGHDMA)
>  
> @@ -3665,14 +3627,14 @@ static int __devinit init_one(struct pci_dev *pdev,
>  		pi = netdev_priv(netdev);
>  		pi->adapter = adapter;
>  		pi->xact_addr_filt = -1;
> -		pi->rx_offload = RX_CSO;
>  		pi->port_id = i;
>  		netdev->irq = pdev->irq;
>  
> -		netdev->features |= NETIF_F_SG | TSO_FLAGS;
> -		netdev->features |= NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM;
> -		netdev->features |= NETIF_F_GRO | NETIF_F_RXHASH | highdma;
> -		netdev->features |= NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_RX;
> +		netdev->hw_features = NETIF_F_SG | TSO_FLAGS |
> +			NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
> +			NETIF_F_RXCSUM | NETIF_F_RXHASH |
> +			NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_RX;
> +		netdev->features |= netdev->hw_features | highdma;
>  		netdev->vlan_features = netdev->features & VLAN_FEAT;

Here vlan_features does not include NETIF_F_RXCSUM but the cxgb4vf bits 
below do include it.  I looked at some other drivers and saw again some 
include it and some don't.  The core VLAN code handles NETIF_F_RXCSUM on its 
own.  Is there some rule for whether drivers should set it in their 
vlan_features or not?

> diff --git a/drivers/net/cxgb4/sge.c b/drivers/net/cxgb4/sge.c
> index 311471b..e8f6f8e 100644
> --- a/drivers/net/cxgb4/sge.c
> +++ b/drivers/net/cxgb4/sge.c
> @@ -1587,7 +1587,7 @@ int t4_ethrx_handler(struct sge_rspq *q, const __be64 *rsp,
>  	pi = netdev_priv(skb->dev);
>  	rxq->stats.pkts++;
>  
> -	if (csum_ok && (pi->rx_offload & RX_CSO) &&
> +	if (csum_ok && (q->netdev->features & NETIF_F_RXCSUM) &&
>  	    (pkt->l2info & htonl(RXF_UDP | RXF_TCP))) {
>  		if (!pkt->ip_frag) {
>  			skb->ip_summed = CHECKSUM_UNNECESSARY;

With this change variable 'pi' can be removed but I can do this cleanup 
after this patch goes in.

> diff --git a/drivers/net/cxgb4vf/cxgb4vf_main.c b/drivers/net/cxgb4vf/cxgb4vf_main.c
> index c662679..04a5c2d 100644
> --- a/drivers/net/cxgb4vf/cxgb4vf_main.c
> +++ b/drivers/net/cxgb4vf/cxgb4vf_main.c
> @@ -2638,14 +2597,13 @@ static int __devinit cxgb4vf_pci_probe(struct pci_dev *pdev,
>  		 * it.
>  		 */
>  		pi->xact_addr_filt = -1;
> -		pi->rx_offload = RX_CSO;
>  		netif_carrier_off(netdev);
>  		netdev->irq = pdev->irq;
>  
> -		netdev->features = (NETIF_F_SG | TSO_FLAGS |
> -				    NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
> -				    NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_RX |
> -				    NETIF_F_GRO);
> +		netdev->hw_features = NETIF_F_SG | TSO_FLAGS | NETIF_F_RXCSUM |
> +			NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
> +			NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_RX;
> +		netdev->features = netdev->hw_features;
>  		if (pci_using_dac)
>  			netdev->features |= NETIF_F_HIGHDMA;
>  		netdev->vlan_features =

cxgb4vf does not implement toggling of NETIF_F_HW_VLAN_RX so I think the 
flag should be set in features but not hw_features or maybe the driver needs 
to handle it in fix_features?

^ permalink raw reply

* Re: The bonding driver should notify userspace of MAC address change
From: Jay Vosburgh @ 2011-04-15 18:53 UTC (permalink / raw)
  To: =?UTF-8?B?Tmljb2xhcyBkZSBQZXNsb8O8YW4=?=
  Cc: =?UTF-8?B?TWljaGHFgiBHw7Nybnk=?=, netdev, roy, Andy Gospodarek
In-Reply-To: <4DA89114.9040900@gmail.com>

Nicolas de Pesloüan 	<nicolas.2p.debian@gmail.com> wrote:

>Le 15/04/2011 18:44, Michał Górny a écrit :
>> Hello,
>>
>> I'd like to file a feature request for the bonding driver. Currently,
>> there is no way for userspace to know whether the driver actually gets
>> a MAC address. This results in the fact that dhcpcd sends MAC-less DHCP
>> packets through bonding device if it is started before bond gets any
>> slaves.
>
>A similar subject, involving bridge instead of bonding, was discussed a
>few weeks ago in this thread:
>http://marc.info/?l=linux-netdev&m=129939017116310&w=2
>
>In particular, I suggested to apply Stephen's suggestion not only to bridge but also to bonding.
>
>(http://marc.info/?l=linux-netdev&m=129948385024680&w=2)
>
>A bonding device should not report link up to userspace until at least one slave is present and up.
>
>And possibly, a bonding device should report link down if all slaves are down or all slave were removed.
>
>Jay, Andy, does this sounds sensible to you?

	I was just reading their bug and doing an experiment; I don't
see that bonding reports carrier up until there's at least one slave
(even if it's configured up), e.g.,

# modprobe bonding
# ifconfig bond0 up
# cat /sys/class/net/bond0/carrier
0
# echo +eth5 > /sys/class/net/bond0/bonding/slaves
# cat /sys/class/net/bond0/carrier
1

	If there's a slave, there's a MAC assigned, since bond_enslave
sets the master's MAC before it calls bond_set_carrier.

	In bond_create, as soon as register_netdevice returns, we call
netif_carrier_off, and it stays off until bond_enslave runs
successfully.

	Is there some race window there between the register and the
netif_carrier_off?

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply

* Re: [PATCH] net: myri10ge: convert to hw_features
From: Michał Mirosław @ 2011-04-15 18:47 UTC (permalink / raw)
  To: Jon Mason; +Cc: netdev, Andrew Gallatin, Brice Goglin
In-Reply-To: <20110415182922.GA2458@myri.com>

On Fri, Apr 15, 2011 at 01:29:22PM -0500, Jon Mason wrote:
> On Fri, Apr 15, 2011 at 04:50:50PM +0200, Michał Mirosław wrote:
> > Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
> > ---
> >  drivers/net/myri10ge/myri10ge.c |   66 +++++++-------------------------------
> >  1 files changed, 12 insertions(+), 54 deletions(-)
[...]
> > -static int myri10ge_set_tso(struct net_device *netdev, u32 tso_enabled)
> > -{
> > -	struct myri10ge_priv *mgp = netdev_priv(netdev);
> > -	u32 flags = mgp->features & (NETIF_F_TSO6 | NETIF_F_TSO);
> > -
> > -	if (tso_enabled)
> > -		netdev->features |= flags;
> > -	else
> > -		netdev->features &= ~flags;
> > -	return 0;
> > -}
> ethtool_op_set_tso does not support TSO6.  This would remove the
> enable/disable of that feature.

Please test this. You'll see it still works.

Best Regards,
Michał Mirosław

^ permalink raw reply

* Re: The bonding driver should notify userspace of MAC address change
From: Nicolas de Pesloüan @ 2011-04-15 18:40 UTC (permalink / raw)
  To: Michał Górny; +Cc: netdev, roy, Jay Vosburgh, Andy Gospodarek
In-Reply-To: <20110415184407.550abd88@pomiocik.lan>

Le 15/04/2011 18:44, Michał Górny a écrit :
> Hello,
>
> I'd like to file a feature request for the bonding driver. Currently,
> there is no way for userspace to know whether the driver actually gets
> a MAC address. This results in the fact that dhcpcd sends MAC-less DHCP
> packets through bonding device if it is started before bond gets any
> slaves.

A similar subject, involving bridge instead of bonding, was discussed a few weeks ago in this 
thread: http://marc.info/?l=linux-netdev&m=129939017116310&w=2

In particular, I suggested to apply Stephen's suggestion not only to bridge but also to bonding.

(http://marc.info/?l=linux-netdev&m=129948385024680&w=2)

A bonding device should not report link up to userspace until at least one slave is present and up.

And possibly, a bonding device should report link down if all slaves are down or all slave were removed.

Jay, Andy, does this sounds sensible to you?

> I've reported that problem upstream [1], and the author suggested that
> the bonding driver would have to notify the userspace about MAC address
> change, suggesting using RTM_NEWLINK message.
>
> I wanted to write a patch for that but I don't seem to see any
> appropriate IFF_* flag for that particular kind of event. dhcpcd author
> suggested using 'ifi->ifi_change = ~0U' but I'm not sure if it's
> appropriate.
>
> Could you either add such a kind of notification or give me a tip on
> how to proceed with adding it? Thanks in advance.
>
> [1] http://roy.marples.name/projects/dhcpcd/ticket/212


^ permalink raw reply

* Re: [PATCH] net: myri10ge: convert to hw_features
From: Andrew Gallatin @ 2011-04-15 18:36 UTC (permalink / raw)
  To: Jon Mason; +Cc: Michał Mirosław, netdev, Brice Goglin
In-Reply-To: <20110415182922.GA2458@myri.com>

On 04/15/11 14:29, Jon Mason wrote:
> On Fri, Apr 15, 2011 at 04:50:50PM +0200, Michał Mirosław wrote:
>> Signed-off-by: Michał Mirosław<mirq-linux@rere.qmqm.pl>
>> ---
>>   drivers/net/myri10ge/myri10ge.c |   66 +++++++-------------------------------
>>   1 files changed, 12 insertions(+), 54 deletions(-)
>>
>> diff --git a/drivers/net/myri10ge/myri10ge.c b/drivers/net/myri10ge/myri10ge.c
>> index 1446de5..a48eb92 100644
>> --- a/drivers/net/myri10ge/myri10ge.c
>> +++ b/drivers/net/myri10ge/myri10ge.c
>> @@ -205,7 +205,6 @@ struct myri10ge_priv {
>>   	int tx_boundary;	/* boundary transmits cannot cross */
>>   	int num_slices;
>>   	int running;		/* running?             */
>> -	int csum_flag;		/* rx_csums?            */
>
> Get rid of MXGEFW_FLAGS_CKSUM in drivers/net/myri10ge/myri10ge_mcp.h,
> as this was the only thing using it.
>

No, please don't.  MXGEFW_FLAGS_CKSUM is a TX descriptor flag that was 
(ab)used as a device state flag as well. See flags in myri10ge_xmit(). 
I think early in the development process, the value of  mgp->csum_flag 
was directly assigned into the descriptor, which is why they shared the 
value.

Drew

^ permalink raw reply

* Re: [RFC v3 5/6] j1939: rename NAME to UUID?
From: Oliver Hartkopp @ 2011-04-15 17:57 UTC (permalink / raw)
  To: Kurt Van Dijck
  Cc: socketcan-core-0fE9KPoRgkgATYTw5x5z8w,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20110413044928.GA289-ozGf4kBk5synFtIcQ8t7k3L8HoS0Hn3T@public.gmane.org>

On 13.04.2011 06:49, Kurt Van Dijck wrote:
> Oliver et.al.,
> 
> On Sun, Mar 20, 2011 at 04:56:46PM +0100, Oliver Hartkopp wrote:
>> On 14.03.2011 14:59, Kurt Van Dijck wrote:
>>
>> Then you suggest to attach static and/or dynamic addresses to the interface.
>>
>>> +  Assigning addresses is done via
>>> +  $ ip addr add dev canX j1939 0xXX
>>> +  statically or
>>> +  $ ip addr add dev canX j1939 name 0xXX
>>> +  dynamically. In the latter case, address claiming must take place
>>> +  before other traffic can leave.
>>
>> like you would have using DHCP/DNS (adapted for j1939) ...
>>
> I suspect the confustion with DHCP/DNS comes free with the used terminology.
> 
> Specifications talk about a 64bit NAME, where is actually is a 64bit UUID.
> Calling this number a UUID may clarify things, but leaves the spec in the
> terminology.
> 
> one would then do:
> $ ip addr add dev canX j1939 uuid XXXX
> 
> Would that be a good way to progress?

Hello Kurt,

i don't know if it helps - at least for j1939 users - to rename the NAME for
j1939 address claiming to UUID which is usually 128 bit long an has a pretty
different understanding than the J1939 NAME which stands for

   1. Arbitrary address bit
   2. Industry group, length 3 bits
   3. Vehicle system instance, length 4 bits
   4. Vehicle system, length 7 bits
   5. Reserved bit
   6. Function, length 8 bits
   7. Function instance, length 5 bits
   8. ECU instance, length 3 bits
   9. Manufacturer code, length 11 bits
  10. Identity number, length 21 bits

(from http://www.kvaser.com/en/about-can/higher-layer-protocols/36.html)

This is not comparable to the ideas from RFC 4122 ...

Thinking about the approach to implement the j1939 address claiming (AC) in
userspace, i discovered two ways which could both be hidden inside some
easy-to-use helper functions:

1. implement a thread (e.g. within a library) which opens a CAN_RAW socket on
a specific CAN-interface and takes care of the AC procedure and monitors
ongoing AC procedures on the bus. In this case every j1939 application
requiring AC internally would monitor all the AC handling on itself (which
should be no general problem - written only once).

2. create j1939ac daemon(s) using PF_UNIX-sockets to be named e.g.
j1939ac_can0, j1939ac_can1, etc. - these daemons take care for all AC
requirements of the host it is running on. The PF_UNIX-sockets are used in
SOCK_DGRAM mode and only the j1939 processes that need AC can then register
their NAME by sending a request datagram, and get back the j1939-address once
it is claimed (and all the updates on changes). As the j1939ac daemons are
running on the same host as the j1939 application processes, optional the
process' PID could be provided to the daemon during the registering process,
so that the daemon can send a signal to a signal handler of the application
process (if you would like to omit the select() syscall to handle both the
j1939 and PF_UNIX sockets).

->   <Req><Name="A3B5667799332242" PID="12345">
<-   <Resp><ACState="claimed" Name="A3B5667799332242" Address="1B">
(some time)
<-   <Resp><ACState="changed" Name="A3B5667799332242" Address="1C">

This is a sketch that could be put into simple C-structs that are sent via the
PF_UNIX DGRAM socket.

In all suggested cases (using a thread, daemon with/without signal) the AC
procedure can be managed in userspace without real pain. But especially with
less pain than putting the AC process into kernelspace and provide your
suggested socket API with bind/connect/... in very different manners.

Regards,
Oliver

^ permalink raw reply

* Re: [PATCH] net: forcedeth: convert to hw_features
From: Michał Mirosław @ 2011-04-15 17:44 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: netdev
In-Reply-To: <1302885588.2845.4.camel@bwh-desktop>

On Fri, Apr 15, 2011 at 05:39:48PM +0100, Ben Hutchings wrote:
> On Fri, 2011-04-15 at 16:50 +0200, Michał Mirosław wrote:
> > This also fixes a race around np->txrxctl_bits while changing RXCSUM offload.
> > 
> > Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
> > ---
> >  drivers/net/forcedeth.c |   78 +++++++++++++++-------------------------------
> >  1 files changed, 26 insertions(+), 52 deletions(-)
> > 
> > diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c
> > index d5ab4da..ec9a32d 100644
> > --- a/drivers/net/forcedeth.c
> > +++ b/drivers/net/forcedeth.c
> > @@ -774,7 +774,6 @@ struct fe_priv {
> >  	u32 driver_data;
> >  	u32 device_id;
> >  	u32 register_size;
> > -	int rx_csum;
> >  	u32 mac_in_use;
> >  	int mgmt_version;
> >  	int mgmt_sema;
> > @@ -4480,58 +4479,36 @@ static int nv_set_pauseparam(struct net_device *dev, struct ethtool_pauseparam*
> >  	return 0;
> >  }
> >  
> > -static u32 nv_get_rx_csum(struct net_device *dev)
> > +static u32 nv_fix_features(struct net_device *dev, u32 features)
> >  {
> > -	struct fe_priv *np = netdev_priv(dev);
> > -	return np->rx_csum != 0;
> > +	/* vlan is dependent on rx checksum offload */
> > +	if (features & (NETIF_F_HW_VLAN_TX|NETIF_F_HW_VLAN_RX))
> > +		features |= NETIF_F_RXCSUM;
> [...]
> 
> Shouldn't this be done the other way round:
> 
> 	if (!(features & NETIF_F_RXCSUM))
> 		features &= ~(NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_RX);
> 
> So long as the VLAN feature flags are still set in wanted_features, they
> will be turned back on automatically if RXCSUM is re-enabled.

Yes, but this way is a direct translation from old ethtool_ops. Changing
this needs implementing changing HW_VLAN features in set_features.

Best Regards,
Michał Mirosław

^ permalink raw reply

* Re: [RFC] possible bug in inet->opt handling
From: Herbert Xu @ 2011-04-15 17:24 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev
In-Reply-To: <20110415171715.GA5648@gondor.apana.org.au>

On Sat, Apr 16, 2011 at 01:17:15AM +0800, Herbert Xu wrote:
> On Fri, Apr 15, 2011 at 05:39:54PM +0200, Eric Dumazet wrote:
> >
> > My plan is to add RCU protection on inet->opt, unless someone has better
> > idea ?
> 
> inet->opt is rarely non-NULL.  So perhaps just throw some locks
> around the memcpy.

Ah I missed your other point about inet->opt going away.  The
other option would be to always kmalloc/memcpy in udp_sendmsg
and have ip_setup_cork simply steal the reference from ipc.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [RFC] possible bug in inet->opt handling
From: Herbert Xu @ 2011-04-15 17:17 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev
In-Reply-To: <1302881994.3613.34.camel@edumazet-laptop>

On Fri, Apr 15, 2011 at 05:39:54PM +0200, Eric Dumazet wrote:
>
> My plan is to add RCU protection on inet->opt, unless someone has better
> idea ?

inet->opt is rarely non-NULL.  So perhaps just throw some locks
around the memcpy.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: BUG: unable to handle kernel NULL pointer dereference at 000002c0 / IP: [<c04c70f2>] in6_dev_finish_destroy+0x35/0x8c
From: Simon Arlott @ 2011-04-15 16:28 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Linux Kernel Mailing List, netdev,
	Netfilter Development Mailinglist
In-Reply-To: <4DA86FE5.8080507@simon.arlott.org.uk>

and again with the patch reverted...

[  470.965098] BUG: unable to handle kernel paging request at a1fd3e8b
[  470.966008] IP: [<c04d89a7>] icmpv6_send+0x5c3/0x6e2
[  470.966008] *pdpt = 00000000318f2001 *pde = 0000000000000000 
[  470.966008] Oops: 0002 [#1] PREEMPT SMP 
[  470.966008] last sysfs file: /sys/devices/platform/it87.552/cpu0_vid
[  470.966008] Modules linked in: nf_conntrack_ipv6 xt_tcpmss xt_length xt_TCPMSS ppp_synctty sch_sfq xt_u32 xt_CLASSIFY sch_htb ppp_async rfcomm bnep l2cap crc16 nfsd lockd sunrpc exportfs xt_state ip6t_LOG ip]
[  470.966008] 
[  470.966008] Pid: 3, comm: ksoftirqd/0 Not tainted 2.6.35.4-git+ #git+ GA-MA69VM-S2/GA-MA69VM-S2
[  470.966008] EIP: 0060:[<c04d89a7>] EFLAGS: 00010286 CPU: 0
[  470.966008] EIP is at icmpv6_send+0x5c3/0x6e2
[  470.966008] EAX: 00000000 EBX: a1fd3daf ECX: 00000000 EDX: 00000001
[  470.966008] ESI: f6f1adb4 EDI: 00000000 EBP: f7483c4c ESP: f7483b48
[  470.966008]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[  470.966008] Process ksoftirqd/0 (pid: 3, ti=f7482000 task=f74800a0 task.ti=f7482000)
[  470.966008] Stack:
[  470.966008]  f493fec0 f7483b5c c0513fe0 00033acf 00033ab5 f7483b7c c022e74d 00000046
[  470.966008] <0> fffffd8a 00033acf 00000001 0101001a f1a2984c 00000500 f6f1aac0 f6f1adb4
[  470.966008] <0> f1a2985c 00000000 00000040 f6f1aaf0 00000000 00000000 00000000 b0060120
[  470.966008] Call Trace:
[  470.966008]  [<c0513fe0>] ? _raw_spin_unlock_irqrestore+0x42/0x58
[  470.966008]  [<c022e74d>] ? release_console_sem+0x197/0x1c4
[  470.966008]  [<fa73c0b5>] ? reject_tg6+0x70/0x43f [ip6t_REJECT]
[  470.966008]  [<fa7619b1>] ? ip6t_log_packet+0x15d/0x167 [ip6t_LOG]
[  470.966008]  [<c024e201>] ? trace_hardirqs_on+0xb/0xd
[  470.966008]  [<c0232a72>] ? local_bh_enable_ip+0x97/0xad
[  470.966008]  [<c0513f59>] ? _raw_spin_unlock_bh+0x2f/0x32
[  470.966008]  [<fa7619b1>] ? ip6t_log_packet+0x15d/0x167 [ip6t_LOG]
[  470.966008]  [<fa6290f0>] ? ipv6_find_hdr+0x48/0x164 [ip6_tables]
[  470.966008]  [<fa6297c1>] ? ip6t_do_table+0x4c8/0x53e [ip6_tables]
[  470.966008]  [<fa6cf0f0>] ? ip6table_mangle_hook+0xf0/0x100 [ip6table_mangle]
[  470.966008]  [<fa634018>] ? ip6table_filter_hook+0x18/0x20 [ip6table_filter]
[  470.966008]  [<c046ee87>] ? nf_iterate+0x2f/0x62
[  470.966008]  [<c04c40c8>] ? ip6_input_finish+0x0/0x3db
[  470.966008]  [<c046f088>] ? nf_hook_slow+0x63/0xeb
[  470.966008]  [<c04c40c8>] ? ip6_input_finish+0x0/0x3db
[  470.966008]  [<c04c44d6>] ? ip6_input+0x33/0x47
[  470.966008]  [<c04c40c8>] ? ip6_input_finish+0x0/0x3db
[  470.966008]  [<c04c4775>] ? ip6_rcv_finish+0x8b/0x8e
[  470.966008]  [<fc81ea3a>] ? nf_ct_frag6_output+0x7c/0x95 [nf_conntrack_ipv6]
[  470.966008]  [<fc81e45c>] ? ipv6_defrag+0x87/0x9f [nf_conntrack_ipv6]
[  470.966008]  [<c04c46ea>] ? ip6_rcv_finish+0x0/0x8e
[  470.966008]  [<c046ee87>] ? nf_iterate+0x2f/0x62
[  470.966008]  [<c04c46ea>] ? ip6_rcv_finish+0x0/0x8e
[  470.966008]  [<c046f088>] ? nf_hook_slow+0x63/0xeb
[  470.966008]  [<c04c46ea>] ? ip6_rcv_finish+0x0/0x8e
[  470.966008]  [<c04c4aff>] ? ipv6_rcv+0x387/0x47c
[  470.966008]  [<c04c46ea>] ? ip6_rcv_finish+0x0/0x8e
[  470.966008]  [<c0455065>] ? __netif_receive_skb+0x367/0x3b6
[  470.966008]  [<c0455142>] ? process_backlog+0x8e/0x146
[  470.966008]  [<c0455c3b>] ? net_rx_action+0x62/0x119
[  470.966008]  [<c0232750>] ? __do_softirq+0x8b/0x10a
[  470.966008]  [<c02327fa>] ? do_softirq+0x2b/0x43
[  470.966008]  [<c0232885>] ? run_ksoftirqd+0x73/0x155
[  470.966008]  [<c0232812>] ? run_ksoftirqd+0x0/0x155
[  470.966008]  [<c023fdbd>] ? kthread+0x61/0x66
[  470.966008]  [<c023fd5c>] ? kthread+0x0/0x66
[  470.966008]  [<c0202c7a>] ? kernel_thread_helper+0x6/0x1a
[  470.966008] Code: e8 1b da d4 ff 68 48 89 4d c0 31 c9 31 d2 b8 58 11 68 c0 6a 00 6a 01 6a 02 e8 37 76 d7 ff 8b 9b 60 01 00 00 83 c4 10 85 db 74 07 <f0> ff 83 dc 00 00 00 b9 ae 89 4d c0 ba 01 00 00 00 b8 58 1 
[  470.966008] EIP: [<c04d89a7>] icmpv6_send+0x5c3/0x6e2 SS:ESP 0068:f7483b48
[  470.966008] CR2: 00000000a1fd3e8b
[  471.387732] ---[ end trace a325ca681eff783c ]---
[  471.388770] __iptables__: l2tp_2 IN=aaisp3 OUT= MAC= SRC=2001:0678:0001:0000:0000:0000:0000:0001 DST=2001:08b0:ffea:0000:0053:4150:5841:0001 LEN=430 TC=0 HOPLIMIT=60 FLOWLBL=0 PROTO=UDP SPT=53 DPT=22008 LEN= 
[  471.388833] __iptables__: l2tp_2 IN=aaisp3 OUT= MAC= SRC=80.68.89.159 DST=81.2.80.67 LEN=143 TOS=0x00 PREC=0x00 TTL=59 ID=12462 PROTO=UDP SPT=53 DPT=36911 LEN=123 
[  471.388874] __iptables__: l2tp_2 IN=aaisp3 OUT= MAC= SRC=208.94.149.2 DST=81.2.80.67 LEN=116 TOS=0x00 PREC=0x00 TTL=58 ID=22935 PROTO=UDP SPT=53 DPT=10068 LEN=96 
[  471.443611] Kernel panic - not syncing: Fatal exception in interrupt
[  471.444704] __iptables__: l2tp_2 IN=aaisp3 OUT= MAC= SRC=208.94.148.2 DST=81.2.80.67 LEN=120 TOS=0x00 PREC=0x00 TTL=58 ID=41552 PROTO=UDP SPT=53 DPT=27444 LEN=100 
[  471.444739] __iptables__: l2tp_2 IN=aaisp3 OUT= MAC= SRC=2a01:06d0:0001:0000:0000:0000:0000:0002 DST=2001:08b0:ffea:0000:0053:4150:5841:0001 LEN=109 TC=0 HOPLIMIT=56 FLOWLBL=0 PROTO=UDP SPT=53 DPT=31279 LEN= 
[  471.484694] Pid: 3, comm: ksoftirqd/0 Tainted: G      D     2.6.35.4-git+ #git+
[  471.492318] Call Trace:
[  471.494885]  [<c0511194>] ? printk+0xf/0x13
[  471.499161]  [<c0511116>] panic+0x55/0xc4
[  471.503331]  [<c02050ed>] oops_end+0x6e/0x7c
[  471.507768]  [<c021a514>] no_context+0x13f/0x149
[  471.512534]  [<c021a657>] __bad_area_nosemaphore+0x139/0x141
[  471.518341]  [<c04cef8d>] ? fib6_lookup+0x48/0x5c
[  471.523203]  [<c04cdd75>] ? ip6_pol_route+0x208/0x223
[  471.528422]  [<c024e201>] ? trace_hardirqs_on+0xb/0xd
[  471.533608]  [<c0232a72>] ? local_bh_enable_ip+0x97/0xad
[  471.539055]  [<c0513d08>] ? _raw_read_unlock_bh+0x2f/0x32
[  471.544620]  [<c04cdd75>] ? ip6_pol_route+0x208/0x223
[  471.549821]  [<c021a66c>] bad_area_nosemaphore+0xd/0x10
[  471.555192]  [<c021a910>] do_page_fault+0x14e/0x302
[  471.560145]  [<c04b3486>] ? __xfrm_lookup+0x32d/0x38b
[  471.565396]  [<c04e8bbe>] ? fib6_rule_lookup+0x35/0x77
[  471.570720]  [<c021a7c2>] ? do_page_fault+0x0/0x302
[  471.575807]  [<c051499b>] error_code+0x6b/0x70
[  471.580443]  [<c021a7c2>] ? do_page_fault+0x0/0x302
[  471.585476]  [<c04d89a7>] ? icmpv6_send+0x5c3/0x6e2
[  471.590527]  [<c0513fe0>] ? _raw_spin_unlock_irqrestore+0x42/0x58
[  471.596755]  [<c022e74d>] ? release_console_sem+0x197/0x1c4
[  471.602547]  [<fa73c0b5>] reject_tg6+0x70/0x43f [ip6t_REJECT]
[  471.608472]  [<fa7619b1>] ? ip6t_log_packet+0x15d/0x167 [ip6t_LOG]
[  471.614834]  [<c024e201>] ? trace_hardirqs_on+0xb/0xd
[  471.620120]  [<c0232a72>] ? local_bh_enable_ip+0x97/0xad
[  471.625575]  [<c0513f59>] ? _raw_spin_unlock_bh+0x2f/0x32
[  471.631145]  [<fa7619b1>] ? ip6t_log_packet+0x15d/0x167 [ip6t_LOG]
[  471.637519]  [<fa6290f0>] ? ipv6_find_hdr+0x48/0x164 [ip6_tables]
[  471.643794]  [<fa6297c1>] ip6t_do_table+0x4c8/0x53e [ip6_tables]
[  471.650014]  [<fa6cf0f0>] ? ip6table_mangle_hook+0xf0/0x100 [ip6table_mangle]
[  471.657364]  [<fa634018>] ip6table_filter_hook+0x18/0x20 [ip6table_filter]
[  471.664419]  [<c046ee87>] nf_iterate+0x2f/0x62
[  471.668935]  [<c04c40c8>] ? ip6_input_finish+0x0/0x3db
[  471.674231]  [<c046f088>] nf_hook_slow+0x63/0xeb
[  471.678999]  [<c04c40c8>] ? ip6_input_finish+0x0/0x3db
[  471.684321]  [<c04c44d6>] ip6_input+0x33/0x47
[  471.688851]  [<c04c40c8>] ? ip6_input_finish+0x0/0x3db
[  471.694097]  [<c04c4775>] ip6_rcv_finish+0x8b/0x8e
[  471.699002]  [<fc81ea3a>] nf_ct_frag6_output+0x7c/0x95 [nf_conntrack_ipv6]
[  471.706039]  [<fc81e45c>] ipv6_defrag+0x87/0x9f [nf_conntrack_ipv6]
[  471.712470]  [<c04c46ea>] ? ip6_rcv_finish+0x0/0x8e
[  471.717471]  [<c046ee87>] nf_iterate+0x2f/0x62
[  471.722013]  [<c04c46ea>] ? ip6_rcv_finish+0x0/0x8e
[  471.727013]  [<c046f088>] nf_hook_slow+0x63/0xeb
[  471.731703]  [<c04c46ea>] ? ip6_rcv_finish+0x0/0x8e
[  471.736764]  [<c04c4aff>] ipv6_rcv+0x387/0x47c
[  471.741384]  [<c04c46ea>] ? ip6_rcv_finish+0x0/0x8e
[  471.746438]  [<c0455065>] __netif_receive_skb+0x367/0x3b6
[  471.752011]  [<c0455142>] process_backlog+0x8e/0x146
[  471.757063]  [<c0455c3b>] net_rx_action+0x62/0x119
[  471.761994]  [<c0232750>] __do_softirq+0x8b/0x10a
[  471.766822]  [<c02327fa>] do_softirq+0x2b/0x43
[  471.771354]  [<c0232885>] run_ksoftirqd+0x73/0x155
[  471.776252]  [<c0232812>] ? run_ksoftirqd+0x0/0x155
[  471.781253]  [<c023fdbd>] kthread+0x61/0x66
[  471.785544]  [<c023fd5c>] ? kthread+0x0/0x66
[  471.789957]  [<c0202c7a>] kernel_thread_helper+0x6/0x1a
[  471.795306] Rebooting in 10 seconds..

-- 
Simon Arlott

^ permalink raw reply

* The bonding driver should notify userspace of MAC address change
From: Michał Górny @ 2011-04-15 16:44 UTC (permalink / raw)
  To: netdev; +Cc: mgorny, roy

[-- Attachment #1: Type: text/plain, Size: 937 bytes --]

Hello,

I'd like to file a feature request for the bonding driver. Currently,
there is no way for userspace to know whether the driver actually gets
a MAC address. This results in the fact that dhcpcd sends MAC-less DHCP
packets through bonding device if it is started before bond gets any
slaves.

I've reported that problem upstream [1], and the author suggested that
the bonding driver would have to notify the userspace about MAC address
change, suggesting using RTM_NEWLINK message.

I wanted to write a patch for that but I don't seem to see any
appropriate IFF_* flag for that particular kind of event. dhcpcd author
suggested using 'ifi->ifi_change = ~0U' but I'm not sure if it's
appropriate.

Could you either add such a kind of notification or give me a tip on
how to proceed with adding it? Thanks in advance.

[1] http://roy.marples.name/projects/dhcpcd/ticket/212

-- 
Best regards,
Michał Górny

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply

* Re: [PATCH] net: forcedeth: convert to hw_features
From: Ben Hutchings @ 2011-04-15 16:39 UTC (permalink / raw)
  To: Michał Mirosław; +Cc: netdev
In-Reply-To: <20110415145049.CC33613A65@rere.qmqm.pl>

On Fri, 2011-04-15 at 16:50 +0200, Michał Mirosław wrote:
> This also fixes a race around np->txrxctl_bits while changing RXCSUM offload.
> 
> Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
> ---
>  drivers/net/forcedeth.c |   78 +++++++++++++++-------------------------------
>  1 files changed, 26 insertions(+), 52 deletions(-)
> 
> diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c
> index d5ab4da..ec9a32d 100644
> --- a/drivers/net/forcedeth.c
> +++ b/drivers/net/forcedeth.c
> @@ -774,7 +774,6 @@ struct fe_priv {
>  	u32 driver_data;
>  	u32 device_id;
>  	u32 register_size;
> -	int rx_csum;
>  	u32 mac_in_use;
>  	int mgmt_version;
>  	int mgmt_sema;
> @@ -4480,58 +4479,36 @@ static int nv_set_pauseparam(struct net_device *dev, struct ethtool_pauseparam*
>  	return 0;
>  }
>  
> -static u32 nv_get_rx_csum(struct net_device *dev)
> +static u32 nv_fix_features(struct net_device *dev, u32 features)
>  {
> -	struct fe_priv *np = netdev_priv(dev);
> -	return np->rx_csum != 0;
> +	/* vlan is dependent on rx checksum offload */
> +	if (features & (NETIF_F_HW_VLAN_TX|NETIF_F_HW_VLAN_RX))
> +		features |= NETIF_F_RXCSUM;
[...]

Shouldn't this be done the other way round:

	if (!(features & NETIF_F_RXCSUM))
		features &= ~(NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_RX);

So long as the VLAN feature flags are still set in wanted_features, they
will be turned back on automatically if RXCSUM is re-enabled.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: BUG: unable to handle kernel NULL pointer dereference at 000002c0 / IP: [<c04c70f2>] in6_dev_finish_destroy+0x35/0x8c
From: Simon Arlott @ 2011-04-15 16:18 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Linux Kernel Mailing List, netdev,
	Netfilter Development Mailinglist
In-Reply-To: <1302873876.3613.11.camel@edumazet-laptop>

On 15/04/11 14:24, Eric Dumazet wrote:
> Hmm.. a more complete patch :
> 
> diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
> index 0857272..6f0bed0 100644

I applied the patch by recompiling and then reloading the nf_conntrack_ipv6
module (temporarily flushing and then restoring all ip6tables rules).
Then this happened 10 minutes later:

[33876.950100] BUG: unable to handle kernel NULL pointer dereference at 00000014
[33876.951060] IP: [<f9b012bb>] nf_ct_frag6_gather+0x864/0x881 [nf_conntrack_ipv6]
[33876.951060] *pdpt = 0000000033491001 *pde = 0000000000000000 
[33876.951060] Oops: 0002 [#1] PREEMPT SMP 
[33876.951060] last sysfs file: /sys/devices/platform/it87.552/cpu0_vid
[33876.951060] Modules linked in: nf_conntrack_ipv6 xt_tcpmss xt_length xt_TCPMSS ppp_synctty sch_sfq xt_u32 xt_CLASSIFY sch_htb ppp_async nfsd lockd sunrpc bnep exportfs rfcomm l2cap crc16 xt_state ip6t_LOG ip]
[33876.951060] 
[33876.951060] Pid: 7, comm: ksoftirqd/1 Not tainted 2.6.35.4-git+ #git+ GA-MA69VM-S2/GA-MA69VM-S2
[33876.951060] EIP: 0060:[<f9b012bb>] EFLAGS: 00010246 CPU: 1
[33876.951060] EIP is at nf_ct_frag6_gather+0x864/0x881 [nf_conntrack_ipv6]
[33877.071165] EAX: f68e1800 EBX: 00000000 ECX: f560f3c0 EDX: f74921a0
[33877.071165] ESI: 00000000 EDI: f636f200 EBP: f7495e34 ESP: f7495ddc
[33877.071165]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[33877.071165] Process ksoftirqd/1 (pid: 7, ti=f7494000 task=f74921a0 task.ti=f7494000)
[33877.071165] Stack:
[33877.071165]  00000001 f5d6c8c0 f636f218 726b4c79 f68e1800 062c1158 f226d06c f560f3c0
[33877.071165] <0> f560f3d4 000005a8 00000000 f74921a0 00000001 00000000 00000000 726b4c79
[33877.071165] <0> 00000001 f226d04c f226d05c f5d6c8c0 00000000 f68e1800 f7495e48 f9b0043e
[33877.071165] Call Trace:
[33877.071165]  [<f9b0043e>] ? ipv6_defrag+0x69/0x9f [nf_conntrack_ipv6]
[33877.071165]  [<c046ee87>] ? nf_iterate+0x2f/0x62
[33877.071165]  [<c04c46ea>] ? ip6_rcv_finish+0x0/0x8e
[33877.071165]  [<c046f088>] ? nf_hook_slow+0x63/0xeb
[33877.071165]  [<c04c46ea>] ? ip6_rcv_finish+0x0/0x8e
[33877.071165]  [<c04c4aff>] ? ipv6_rcv+0x387/0x47c
[33877.071165]  [<c04c46ea>] ? ip6_rcv_finish+0x0/0x8e
[33877.071165]  [<c0455065>] ? __netif_receive_skb+0x367/0x3b6
[33877.071165]  [<c0455142>] ? process_backlog+0x8e/0x146
[33877.071165]  [<c0455c3b>] ? net_rx_action+0x62/0x119
[33877.071165]  [<c0232750>] ? __do_softirq+0x8b/0x10a
[33877.071165]  [<c02327fa>] ? do_softirq+0x2b/0x43
[33877.071165]  [<c0232885>] ? run_ksoftirqd+0x73/0x155
[33877.071165]  [<c0232812>] ? run_ksoftirqd+0x0/0x155
[33877.071165]  [<c023fdbd>] ? kthread+0x61/0x66
[33877.071165]  [<c023fd5c>] ? kthread+0x0/0x66
[33877.071165]  [<c0202c7a>] ? kernel_thread_helper+0x6/0x1a
[33877.071165] Code: 02 31 db 8b 45 c8 e8 8f 2c a1 c6 8b 4d c4 f0 ff 49 30 0f 94 c0 84 c0 74 0f 8b 45 c4 31 c9 ba 78 1a b0 f9 e8 38 fe 99 c6 8b 45 b8 <89> 43 14 89 5d ac eb 07 89 f8 e8 11 e3 94 c6 8b 45 ac 8d 6 
[33877.071165] EIP: [<f9b012bb>] nf_ct_frag6_gather+0x864/0x881 [nf_conntrack_ipv6] SS:ESP 0068:f7495ddc
[33877.071165] CR2: 0000000000000014
[33877.253064] ---[ end trace 91cffe982fd021cc ]---
[33877.257847] Kernel panic - not syncing: Fatal exception in interrupt
[33877.264339] Pid: 7, comm: ksoftirqd/1 Tainted: G      D     2.6.35.4-git+ #git+
[33877.271842] Call Trace:
[33877.274420]  [<c0511194>] ? printk+0xf/0x13
[33877.278743]  [<c0511116>] panic+0x55/0xc4
[33877.282860]  [<c02050ed>] oops_end+0x6e/0x7c
[33877.287239]  [<c021a514>] no_context+0x13f/0x149
[33877.291988]  [<c021a657>] __bad_area_nosemaphore+0x139/0x141
[33877.297802]  [<c0224fb6>] ? task_rq_lock+0x36/0x60
[33877.302760]  [<c021a66c>] bad_area_nosemaphore+0xd/0x10
[33877.308107]  [<c021a910>] do_page_fault+0x14e/0x302
[33877.313119]  [<c0513a46>] ? _raw_spin_lock_irqsave+0x35/0x3e
[33877.318985]  [<c0513fe0>] ? _raw_spin_unlock_irqrestore+0x42/0x58
[33877.325261]  [<c021a7c2>] ? do_page_fault+0x0/0x302
[33877.330306]  [<c051499b>] error_code+0x6b/0x70
[33877.334854]  [<c021a7c2>] ? do_page_fault+0x0/0x302
[33877.339926]  [<f9b012bb>] ? nf_ct_frag6_gather+0x864/0x881 [nf_conntrack_ipv6]
[33877.347451]  [<f9b0043e>] ipv6_defrag+0x69/0x9f [nf_conntrack_ipv6]
[33877.353958]  [<c046ee87>] nf_iterate+0x2f/0x62
[33877.358560]  [<c04c46ea>] ? ip6_rcv_finish+0x0/0x8e
[33877.363588]  [<c046f088>] nf_hook_slow+0x63/0xeb
[33877.368322]  [<c04c46ea>] ? ip6_rcv_finish+0x0/0x8e
[33877.373388]  [<c04c4aff>] ipv6_rcv+0x387/0x47c
[33877.377965]  [<c04c46ea>] ? ip6_rcv_finish+0x0/0x8e
[33877.383022]  [<c0455065>] __netif_receive_skb+0x367/0x3b6
[33877.388558]  [<c0455142>] process_backlog+0x8e/0x146
[33877.393715]  [<c0455c3b>] net_rx_action+0x62/0x119
[33877.398664]  [<c0232750>] __do_softirq+0x8b/0x10a
[33877.403554]  [<c02327fa>] do_softirq+0x2b/0x43
[33877.408154]  [<c0232885>] run_ksoftirqd+0x73/0x155
[33877.413051]  [<c0232812>] ? run_ksoftirqd+0x0/0x155
[33877.418053]  [<c023fdbd>] kthread+0x61/0x66
[33877.422360]  [<c023fd5c>] ? kthread+0x0/0x66
[33877.426735]  [<c0202c7a>] kernel_thread_helper+0x6/0x1a

-- 
Simon Arlott

^ permalink raw reply

* Re: Suspend/resume - slow resume
From: Linus Torvalds @ 2011-04-15 16:14 UTC (permalink / raw)
  To: Ciprian Docan, netdev, Linux Kernel Mailing List
  Cc: Francois Romieu, Len Brown, Pavel Machek, Rafael, J. Wysocki
In-Reply-To: <Pine.SOC.4.64.1104151122330.1320@er3.rutgers.edu>

On Fri, Apr 15, 2011 at 8:33 AM, Ciprian Docan <docan@eden.rutgers.edu> wrote:
>
> I enabled CONFIG_PRINTK_TIME in the config file and rebuild the kernel; and
> after a fresh reboot I saved the dmesg output before and after
> suspend/resume. Attached is the diff file containing the resume sequence. At
> linew 133 - 134 there is a longer gap, which I think is related to the r8169
> driver.
>
> I have unloaded the module and did a new suspend/resume sequence, but this
> time with the r8169 module removed it went much faster (3-4s to resume
> completely). Can you please help here ?

Ok, this seems to be yet another case of the ridiculously common error
of "let's load the firmware early, before the machine is even up". It
sometimes happens at boot-time (driver writers only test the module
case, not the built-in case), but it happens very often at
suspend/resume time (which driver writers seem to not test at all).

It is not valid to try to load the firmware at resume time. User space
isn't running, so all the firmware loading helpers are frozen.
Firmware that needs to be reloaded after a suspend needs to be saved
to memory!

I added netdev to the recipient list, because this is definitely not
just a r8169 issue - we've had this bug happen over and over again.

Looking at the code, it looks like r8169 _tries_ to save the firmware
(it has had this SAME bug before), but in case no firmware was ever
loaded successfully at all (either because there was no firmware to
begin with, or because the device had never been opened, so it had
never tried to load it before), it _still_ tries to load the firmware.

WTF? At resume time, you should _resume_, not "load firmware even if
it wasn't loaded before".

I also wonder if the generic power management layer code could do
something smarter about this. Print a big warning when somebody wants
to load firmware when the machine isn't fully operational, instead of
the "wait 60 seconds in the totally futile hope that things will
change". Adding greg to the cc for that case, he's marked as firmware
loader maintainer.

So Francois, can we please not load the firmware at resume time when
it wasn't loaded when suspended!

Greg/Rafael/whoever - could we please replace the "sleep for a minute
if you can't load the firmware" with a big immediate WARN_ON() if
somebody tries to load it at early boot time or at resume time? That
way we don't have the machine dead for sixty seconds (most people just
assume it's dead and will just reboot - I've done that myself when
I've been hit by this), and we'd get a nice "here's the offender"
printout.

And netdev/lkml just cc'd for information, just in case some driver
writer is lurking and suddenly realizes that their driver is broken
too.

                             Linus

^ permalink raw reply

* [PATCH] proc_sysctl: use rcu_dereference before accessing proc_inode->sysctl->* members
From: Lucian Adrian Grijincu @ 2011-04-15 15:52 UTC (permalink / raw)
  To: netdev; +Cc: Lucian Adrian Grijincu, Paul E. McKenney, Al Viro, Nick Piggin

rcu_dereference/rcu_assign_pointer were added to protect some accesses
to ->sysctl in:

    commit dfef6dcd35cb4a251f6322ca9b2c06f0bb1aa1f4
    Author: Al Viro <viro@zeniv.linux.org.uk>
    Date:   Tue Mar 8 01:25:28 2011 -0500

      unfuck proc_sysctl ->d_compare()

This patch puts rcu_dereference where it's missing.

I'm not sure that this patch is needed. I may have misunderstood
Documentation/RCU/checklist.txt

Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Lucian Adrian Grijincu <lucian.grijincu@gmail.com>
---
 fs/proc/proc_sysctl.c   |    9 +++++----
 include/linux/proc_fs.h |    2 +-
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index f50133c..0f5a063 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -70,8 +70,9 @@ static struct ctl_table *find_in_table(struct ctl_table *p, struct qstr *name)
 
 static struct ctl_table_header *grab_header(struct inode *inode)
 {
-	if (PROC_I(inode)->sysctl)
-		return sysctl_head_grab(PROC_I(inode)->sysctl);
+	struct ctl_table_header *head = rcu_dereference(PROC_I(inode)->sysctl);
+	if (head)
+		return sysctl_head_grab(head);
 	else
 		return sysctl_head_next(NULL);
 }
@@ -394,12 +395,12 @@ static int proc_sys_revalidate(struct dentry *dentry, struct nameidata *nd)
 {
 	if (nd->flags & LOOKUP_RCU)
 		return -ECHILD;
-	return !PROC_I(dentry->d_inode)->sysctl->unregistering;
+	return !rcu_dereference(PROC_I(dentry->d_inode)->sysctl)->unregistering;
 }
 
 static int proc_sys_delete(const struct dentry *dentry)
 {
-	return !!PROC_I(dentry->d_inode)->sysctl->unregistering;
+	return !!rcu_dereference(PROC_I(dentry->d_inode)->sysctl)->unregistering;
 }
 
 static int proc_sys_compare(const struct dentry *parent,
diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index 838c114..a38cb74 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -266,7 +266,7 @@ struct proc_inode {
 	int fd;
 	union proc_op op;
 	struct proc_dir_entry *pde;
-	struct ctl_table_header *sysctl;
+	struct ctl_table_header *__rcu sysctl;
 	struct ctl_table *sysctl_entry;
 	struct inode vfs_inode;
 };
-- 
1.7.5.rc0


^ permalink raw reply related

* [RFC] possible bug in inet->opt handling
From: Eric Dumazet @ 2011-04-15 15:39 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, netdev

In commit 903ab86d19 (udp: Add lockless transmit path), we added a
fastpath to avoid taking socket lock if we dont use corking.

Prior work were commit 1c32c5ad6fac8c (inet: Add ip_make_skb and
ip_finish_skb) and commit 1470ddf7f8cecf776921e5 (inet: Remove explicit
write references to sk/inet in ip_append_data)

Problem is ip_make_skb() calls ip_setup_cork() and
ip_setup_cork() possibly makes a copy of ipc->opt (struct ip_options),
without any protection against another thread manipulating inet->opt.

Another thread can change inet->opt pointer and free old one... kaboom.

This was discovered by code analysis (I am trying to remove the zeroing
of cork variable in ip_make_skb(), since its a bit expensive and
probably useless)

Note : race was there before Herbert patches.

My plan is to add RCU protection on inet->opt, unless someone has better
idea ?

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox