Netdev List
 help / color / mirror / Atom feed
* [PATCH v2 0/2] tlan improvements
From: Sakari Ailus @ 2011-01-21 20:59 UTC (permalink / raw)
  To: netdev; +Cc: Samuel Chessman, Joe Perches

Hi,

This is the second version of the tlan improvements patchset.

This patchset cleans up the tlan driver code and adds suspend/resume 
support. The changes in coding style are big. There's one checkpatch.pl 
warning left.

The last patch, which is the biggest usability improvement to the tlan 
driver in this set, brings suspend/resume support. I lended a few bits 
from the e100 driver which I know has received more attention from 
developers over the years. That, I believe, is the only non-trivial 
change in the set.

Changes since v1:

The second patch has been dropped. Ben Hutchings pointed out that 
instead of fixing the prints ethtool interface should be supported. I 
have not time to do that right now and I would like the cleanup and 
possibly also suspend/resume support in without that.

Cheers,

-- 
Sakari Ailus
sakari.ailus@iki.fi

^ permalink raw reply

* Re: [PATCH] tlan: Use pr_fmt, pr_<level> and netdev_<level>, remove changelog
From: Sakari Ailus @ 2011-01-21 20:49 UTC (permalink / raw)
  To: Joe Perches; +Cc: netdev, Samuel Chessman
In-Reply-To: <1294354088.12561.295.camel@Joe-Laptop>

Joe Perches wrote:
> Neatening and standardization to the standard logging mechanisms.
> The changelog isn't useful anymore.
> Miscellaneous speen/speed typo correction.

Hi Joe,

Many thanks for the patch!

I definitely think it's good to replace pr_ prints with netdev_* macros. 
I have a few other comments on your patch below.

> Signed-off-by: Joe Perches<joe@perches.com>
> ---
>
> On top of Sakari Ailus' patches...
>
>   drivers/net/tlan.c |  304 +++++++++++++---------------------------------------
>   1 files changed, 74 insertions(+), 230 deletions(-)
>
> diff --git a/drivers/net/tlan.c b/drivers/net/tlan.c
> index bbb0b12..ecfae1d 100644
> --- a/drivers/net/tlan.c
> +++ b/drivers/net/tlan.c
> @@ -25,153 +25,10 @@
>    *		Microchip Technology, 24C01A/02A/04A Data Sheet
>    *			available in PDF format from www.microchip.com
>    *
> - * Change History
> - *
> - *	Tigran Aivazian<tigran@sco.com>:	TLan_PciProbe() now uses
> - *						new PCI BIOS interface.
> - *	Alan Cox	<alan@lxorguk.ukuu.org.uk>:
> - *						Fixed the out of memory
> - *						handling.
> - *
> - *	Torben Mathiasen<torben.mathiasen@compaq.com>  New Maintainer!
> - *
> - *	v1.1 Dec 20, 1999    - Removed linux version checking
> - *			       Patch from Tigran Aivazian.
> - *			     - v1.1 includes Alan's SMP updates.
> - *			     - We still have problems on SMP though,
> - *			       but I'm looking into that.
> - *
> - *	v1.2 Jan 02, 2000    - Hopefully fixed the SMP deadlock.
> - *			     - Removed dependency of HZ being 100.
> - *			     - We now allow higher priority timers to
> - *			       overwrite timers like TLAN_TIMER_ACTIVITY
> - *			       Patch from John Cagle<john.cagle@compaq.com>.
> - *			     - Fixed a few compiler warnings.
> - *
> - *	v1.3 Feb 04, 2000    - Fixed the remaining HZ issues.
> - *			     - Removed call to pci_present().
> - *			     - Removed SA_INTERRUPT flag from irq handler.
> - *			     - Added __init and __initdata to reduce resisdent
> - *			       code size.
> - *			     - Driver now uses module_init/module_exit.
> - *			     - Rewrote init_module and tlan_probe to
> - *			       share a lot more code. We now use tlan_probe
> - *			       with builtin and module driver.
> - *			     - Driver ported to new net API.
> - *			     - tlan.txt has been reworked to reflect current
> - *			       driver (almost)
> - *			     - Other minor stuff
> - *
> - *	v1.4 Feb 10, 2000    - Updated with more changes required after Dave's
> - *			       network cleanup in 2.3.43pre7 (Tigran&  myself)
> - *			     - Minor stuff.
> - *
> - *	v1.5 March 22, 2000  - Fixed another timer bug that would hang the
> - *			       driver if no cable/link were present.
> - *			     - Cosmetic changes.
> - *			     - TODO: Port completely to new PCI/DMA API
> - *				     Auto-Neg fallback.
> - *
> - *	v1.6 April 04, 2000  - Fixed driver support for kernel-parameters.
> - *			       Haven't tested it though, as the kernel support
> - *			       is currently broken (2.3.99p4p3).
> - *			     - Updated tlan.txt accordingly.
> - *			     - Adjusted minimum/maximum frame length.
> - *			     - There is now a TLAN website up at
> - *			       http://hp.sourceforge.net/
> - *
> - *	v1.7 April 07, 2000  - Started to implement custom ioctls. Driver now
> - *			       reports PHY information when used with Donald
> - *			       Beckers userspace MII diagnostics utility.
> - *
> - *	v1.8 April 23, 2000  - Fixed support for forced speed/duplex settings.
> - *			     - Added link information to Auto-Neg and forced
> - *			       modes. When NIC operates with auto-neg the driver
> - *			       will report Link speed&  duplex modes as well as
> - *			       link partner abilities. When forced link is used,
> - *			       the driver will report status of the established
> - *			       link.
> - *			       Please read tlan.txt for additional information.
> - *			     - Removed call to check_region(), and used
> - *			       return value of request_region() instead.
> - *
> - *	v1.8a May 28, 2000   - Minor updates.
> - *
> - *	v1.9 July 25, 2000   - Fixed a few remaining Full-Duplex issues.
> - *			     - Updated with timer fixes from Andrew Morton.
> - *			     - Fixed module race in TLan_Open.
> - *			     - Added routine to monitor PHY status.
> - *			     - Added activity led support for Proliant devices.
> - *
> - *	v1.10 Aug 30, 2000   - Added support for EISA based tlan controllers
> - *			       like the Compaq NetFlex3/E.
> - *			     - Rewrote tlan_probe to better handle multiple
> - *			       bus probes. Probing and device setup is now
> - *			       done through TLan_Probe and TLan_init_one. Actual
> - *			       hardware probe is done with kernel API and
> - *			       TLan_EisaProbe.
> - *			     - Adjusted debug information for probing.
> - *			     - Fixed bug that would cause general debug
> - *			       information to be printed after driver removal.
> - *			     - Added transmit timeout handling.
> - *			     - Fixed OOM return values in tlan_probe.
> - *			     - Fixed possible mem leak in tlan_exit
> - *			       (now tlan_remove_one).
> - *			     - Fixed timer bug in TLan_phyMonitor.
> - *			     - This driver version is alpha quality, please
> - *			       send me any bug issues you may encounter.
> - *
> - *	v1.11 Aug 31, 2000   - Do not try to register irq 0 if no irq line was
> - *			       set for EISA cards.
> - *			     - Added support for NetFlex3/E with nibble-rate
> - *			       10Base-T PHY. This is untestet as I haven't got
> - *			       one of these cards.
> - *			     - Fixed timer being added twice.
> - *			     - Disabled PhyMonitoring by default as this is
> - *			       work in progress. Define MONITOR to enable it.
> - *			     - Now we don't display link info with PHYs that
> - *			       doesn't support it (level1).
> - *			     - Incresed tx_timeout beacuse of auto-neg.
> - *			     - Adjusted timers for forced speeds.
> - *
> - *	v1.12 Oct 12, 2000   - Minor fixes (memleak, init, etc.)
> - *
> - *	v1.13 Nov 28, 2000   - Stop flooding console with auto-neg issues
> - *			       when link can't be established.
> - *			     - Added the bbuf option as a kernel parameter.
> - *			     - Fixed ioaddr probe bug.
> - *			     - Fixed stupid deadlock with MII interrupts.
> - *			     - Added support for speed/duplex selection with
> - *			       multiple nics.
> - *			     - Added partly fix for TX Channel lockup with
> - *			       TLAN v1.0 silicon. This needs to be investigated
> - *			       further.
> - *
> - *	v1.14 Dec 16, 2000   - Added support for servicing multiple frames per.
> - *			       interrupt. Thanks goes to
> - *			       Adam Keys<adam@ti.com>
> - *			       Denis Beaudoin<dbeaudoin@ti.com>
> - *			       for providing the patch.
> - *			     - Fixed auto-neg output when using multiple
> - *			       adapters.
> - *			     - Converted to use new taskq interface.
> - *
> - *	v1.14a Jan 6, 2001   - Minor adjustments (spinlocks, etc.)
> - *
> - *	Samuel Chessman<chessman@tux.org>  New Maintainer!
> - *
> - *	v1.15 Apr 4, 2002    - Correct operation when aui=1 to be
> - *			       10T half duplex no loopback
> - *			       Thanks to Gunnar Eikman
> - *
> - *	Sakari Ailus<sakari.ailus@iki.fi>:
> - *
> - *	v1.15a Dec 15 2008   - Remove bbuf support, it doesn't work anyway.
> - *	v1.16  Jan 6  2011   - Make checkpatch.pl happy.
> - *	v1.17  Jan 6  2011   - Add suspend/resume support.
> - *

I agree with this. I just didn't think too much while writing my 
patchset. :-)

>    ******************************************************************************/
>
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
>   #include<linux/module.h>
>   #include<linux/init.h>
>   #include<linux/ioport.h>
> @@ -204,7 +61,7 @@ module_param_array(speed, int, NULL, 0);
>   MODULE_PARM_DESC(aui, "ThunderLAN use AUI port(s) (0-1)");
>   MODULE_PARM_DESC(duplex,
>   		 "ThunderLAN duplex setting(s) (0-default, 1-half, 2-full)");
> -MODULE_PARM_DESC(speed, "ThunderLAN port speen setting(s) (0,10,100)");
> +MODULE_PARM_DESC(speed, "ThunderLAN port speed setting(s) (0,10,100)");
>
>   MODULE_AUTHOR("Maintainer: Samuel Chessman<chessman@tux.org>");
>   MODULE_DESCRIPTION("Driver for TI ThunderLAN based ethernet PCI adapters");
> @@ -542,7 +399,7 @@ static int __init tlan_probe(void)
>   {
>   	int rc = -ENODEV;
>
> -	printk(KERN_INFO "%s", tlan_banner);
> +	pr_info("%s", tlan_banner);
>
>   	TLAN_DBG(TLAN_DEBUG_PROBE, "Starting PCI Probe....\n");
>
> @@ -551,16 +408,16 @@ static int __init tlan_probe(void)
>   	rc = pci_register_driver(&tlan_driver);
>
>   	if (rc != 0) {
> -		printk(KERN_ERR "TLAN: Could not register pci driver.\n");
> +		pr_err("Could not register pci driver\n");
>   		goto err_out_pci_free;
>   	}
>
>   	TLAN_DBG(TLAN_DEBUG_PROBE, "Starting EISA Probe....\n");
>   	tlan_eisa_probe();
>
> -	printk(KERN_INFO "TLAN: %d device%s installed, PCI: %d  EISA: %d\n",
> -	       tlan_devices_installed, tlan_devices_installed == 1 ? "" : "s",
> -	       tlan_have_pci, tlan_have_eisa);
> +	pr_info("%d device%s installed, PCI: %d  EISA: %d\n",
> +		tlan_devices_installed, tlan_devices_installed == 1 ? "" : "s",
> +		tlan_have_pci, tlan_have_eisa);
>
>   	if (tlan_devices_installed == 0) {
>   		rc = -ENODEV;
> @@ -619,7 +476,7 @@ static int __devinit tlan_probe1(struct pci_dev *pdev,
>
>   		rc = pci_request_regions(pdev, tlan_signature);
>   		if (rc) {
> -			printk(KERN_ERR "TLAN: Could not reserve IO regions\n");
> +			pr_err("Could not reserve IO regions\n");

I think that, now that we do have a struct device (pci_dev.dev) 
reference, we should use dev_* macros.

I think I'll just resend my patchset with Ben's comments --- i.e. for 
now I just remove my 2nd patch. The first one I'm keeping as-is since 
it's important that this one gets in. Everything else will conflict with 
that!

Ethtool interface support could perhaps be a topic for another set?

>   			goto err_out;
>   		}
>   	}
> @@ -627,7 +484,7 @@ static int __devinit tlan_probe1(struct pci_dev *pdev,
>
>   	dev = alloc_etherdev(sizeof(struct tlan_priv));
>   	if (dev == NULL) {
> -		printk(KERN_ERR "TLAN: Could not allocate memory for device.\n");
> +		pr_err("Could not allocate memory for device\n");

dev_err() also here.

>   		rc = -ENOMEM;
>   		goto err_out_regions;
>   	}
> @@ -646,8 +503,7 @@ static int __devinit tlan_probe1(struct pci_dev *pdev,
>
>   		rc = pci_set_dma_mask(pdev, DMA_BIT_MASK(32));
>   		if (rc) {
> -			printk(KERN_ERR
> -			       "TLAN: No suitable PCI mapping available.\n");
> +			pr_err("No suitable PCI mapping available\n");

And here.

>   			goto err_out_free_dev;
>   		}
>
> @@ -661,7 +517,7 @@ static int __devinit tlan_probe1(struct pci_dev *pdev,
>   			}
>   		}
>   		if (!pci_io_base) {
> -			printk(KERN_ERR "TLAN: No IO mappings available\n");
> +			pr_err("No IO mappings available\n");
>   			rc = -EIO;
>   			goto err_out_free_dev;
>   		}
> @@ -717,13 +573,13 @@ static int __devinit tlan_probe1(struct pci_dev *pdev,
>
>   	rc = tlan_init(dev);
>   	if (rc) {
> -		printk(KERN_ERR "TLAN: Could not set up device.\n");
> +		pr_err("Could not set up device\n");
>   		goto err_out_free_dev;
>   	}
>
>   	rc = register_netdev(dev);
>   	if (rc) {
> -		printk(KERN_ERR "TLAN: Could not register device.\n");
> +		pr_err("Could not register device\n");
>   		goto err_out_uninit;
>   	}
>
> @@ -740,12 +596,11 @@ static int __devinit tlan_probe1(struct pci_dev *pdev,
>   		tlan_have_eisa++;
>   	}
>
> -	printk(KERN_INFO "TLAN: %s irq=%2d, io=%04x, %s, Rev. %d\n",
> -	       dev->name,
> -	       (int) dev->irq,
> -	       (int) dev->base_addr,
> -	       priv->adapter->device_label,
> -	       priv->adapter_rev);
> +	netdev_info(dev, "irq=%2d, io=%04x, %s, Rev. %d\n",
> +		    (int)dev->irq,
> +		    (int)dev->base_addr,
> +		    priv->adapter->device_label,
> +		    priv->adapter_rev);
>   	return 0;
>
>   err_out_uninit:
> @@ -861,7 +716,7 @@ static void  __init tlan_eisa_probe(void)
>   		}
>
>   		if (debug == 0x10)
> -			printk(KERN_INFO "Found one\n");
> +			pr_info("Found one\n");
>
>
>   		/* Get irq from board */
> @@ -890,12 +745,12 @@ static void  __init tlan_eisa_probe(void)
>
>   out:
>   		if (debug == 0x10)
> -			printk(KERN_INFO "None found\n");
> +			pr_info("None found\n");
>   		continue;
>
>   out2:
>   		if (debug == 0x10)
> -			printk(KERN_INFO "Card found but it is not enabled, skipping\n");
> +			pr_info("Card found but it is not enabled, skipping\n");
>   		continue;
>
>   	}
> @@ -963,8 +818,7 @@ static int tlan_init(struct net_device *dev)
>   	priv->dma_size = dma_size;
>
>   	if (priv->dma_storage == NULL) {
> -		printk(KERN_ERR
> -		       "TLAN:  Could not allocate lists and buffers for %s.\n",
> +		pr_err("Could not allocate lists and buffers for %s\n",
>   		       dev->name);
>   		return -ENOMEM;
>   	}
> @@ -982,9 +836,8 @@ static int tlan_init(struct net_device *dev)
>   					 (u8) priv->adapter->addr_ofs + i,
>   					 (u8 *)&dev->dev_addr[i]);
>   	if (err) {
> -		printk(KERN_ERR "TLAN: %s: Error reading MAC from eeprom: %d\n",
> -		       dev->name,
> -		       err);
> +		pr_err("%s: Error reading MAC from eeprom: %d\n",
> +		       dev->name, err);
>   	}
>   	dev->addr_len = 6;
>
> @@ -1028,8 +881,8 @@ static int tlan_open(struct net_device *dev)
>   			  dev->name, dev);
>
>   	if (err) {
> -		pr_err("TLAN:  Cannot open %s because IRQ %d is already in use.\n",
> -		       dev->name, dev->irq);
> +		netdev_err(dev, "Cannot open because IRQ %d is already in use\n",
> +			   dev->irq);
>   		return err;
>   	}
>
> @@ -1512,8 +1365,8 @@ static u32 tlan_handle_tx_eof(struct net_device *dev, u16 host_int)
>   	}
>
>   	if (!ack)
> -		printk(KERN_INFO
> -		       "TLAN: Received interrupt for uncompleted TX frame.\n");
> +		netdev_info(dev,
> +			    "Received interrupt for uncompleted TX frame\n");
>
>   	if (eoc) {
>   		TLAN_DBG(TLAN_DEBUG_TX,
> @@ -1666,8 +1519,8 @@ drop_and_reuse:
>   	}
>
>   	if (!ack)
> -		printk(KERN_INFO
> -		       "TLAN: Received interrupt for uncompleted RX frame.\n");
> +		netdev_info(dev,
> +			    "Received interrupt for uncompleted RX frame\n");
>
>
>   	if (eoc) {
> @@ -1723,7 +1576,7 @@ drop_and_reuse:
>
>   static u32 tlan_handle_dummy(struct net_device *dev, u16 host_int)
>   {
> -	pr_info("TLAN:  Test interrupt on %s.\n", dev->name);
> +	netdev_info(dev, "Test interrupt\n");
>   	return 1;
>
>   }
> @@ -1816,7 +1669,7 @@ static u32 tlan_handle_status_check(struct net_device *dev, u16 host_int)
>   	if (host_int&  TLAN_HI_IV_MASK) {
>   		netif_stop_queue(dev);
>   		error = inl(dev->base_addr + TLAN_CH_PARM);
> -		pr_info("TLAN:  %s: Adaptor Error = 0x%x\n", dev->name, error);
> +		netdev_info(dev, "Adaptor Error = 0x%x\n", error);
>   		tlan_read_and_clear_stats(dev, TLAN_RECORD);
>   		outl(TLAN_HC_AD_RST, dev->base_addr + TLAN_HOST_CMD);
>
> @@ -2057,7 +1910,7 @@ static void tlan_reset_lists(struct net_device *dev)
>   		list->buffer[0].count = TLAN_MAX_FRAME_SIZE | TLAN_LAST_BUFFER;
>   		skb = netdev_alloc_skb_ip_align(dev, TLAN_MAX_FRAME_SIZE + 5);
>   		if (!skb) {
> -			pr_err("TLAN: out of memory for received data.\n");
> +			netdev_err(dev, "Out of memory for received data\n");
>   			break;
>   		}
>
> @@ -2141,13 +1994,13 @@ static void tlan_print_dio(u16 io_base)
>   	u32 data0, data1;
>   	int	i;
>
> -	pr_info("TLAN:   Contents of internal registers for io base 0x%04hx.\n",
> -	       io_base);
> -	pr_info("TLAN:      Off.  +0	 +4\n");
> +	pr_info("Contents of internal registers for io base 0x%04hx\n",
> +		io_base);
> +	pr_info("Off.  +0        +4\n");

I think struct struct net_device could replace io_base as the argument. 
Then we'd have struct net_device and could use netdev_info.

>   	for (i = 0; i<  0x4C; i += 8) {
>   		data0 = tlan_dio_read32(io_base, i);
>   		data1 = tlan_dio_read32(io_base, i + 0x4);
> -		pr_info("TLAN:      0x%02x  0x%08x 0x%08x\n", i, data0, data1);
> +		pr_info("0x%02x  0x%08x 0x%08x\n", i, data0, data1);
>   	}
>
>   }
> @@ -2176,14 +2029,14 @@ static void tlan_print_list(struct tlan_list *list, char *type, int num)

Add struct net_device here as well.

>   {
>   	int i;
>
> -	pr_info("TLAN:   %s List %d at %p\n", type, num, list);
> -	pr_info("TLAN:      Forward    = 0x%08x\n",  list->forward);
> -	pr_info("TLAN:      CSTAT      = 0x%04hx\n", list->c_stat);
> -	pr_info("TLAN:      Frame Size = 0x%04hx\n", list->frame_size);
> +	pr_info("%s List %d at %p\n", type, num, list);
> +	pr_info("   Forward    = 0x%08x\n",  list->forward);
> +	pr_info("   CSTAT      = 0x%04hx\n", list->c_stat);
> +	pr_info("   Frame Size = 0x%04hx\n", list->frame_size);
>   	/* for (i = 0; i<  10; i++) { */
>   	for (i = 0; i<  2; i++) {
> -		pr_info("TLAN:      Buffer[%d].count, addr = 0x%08x, 0x%08x\n",
> -		       i, list->buffer[i].count, list->buffer[i].address);
> +		pr_info("   Buffer[%d].count, addr = 0x%08x, 0x%08x\n",
> +			i, list->buffer[i].count, list->buffer[i].address);
>   	}
>
>   }
> @@ -2398,7 +2251,7 @@ tlan_finish_reset(struct net_device *dev)
>   	if ((priv->adapter->flags&  TLAN_ADAPTER_UNMANAGED_PHY) ||
>   	    (priv->aui)) {
>   		status = MII_GS_LINK;
> -		pr_info("TLAN:  %s: Link forced.\n", dev->name);
> +		netdev_info(dev, "Link forced\n");
>   	} else {
>   		tlan_mii_read_reg(dev, phy, MII_GEN_STS,&status);
>   		udelay(1000);
> @@ -2410,24 +2263,20 @@ tlan_finish_reset(struct net_device *dev)
>   			tlan_mii_read_reg(dev, phy, MII_AN_LPA,&partner);
>   			tlan_mii_read_reg(dev, phy, TLAN_TLPHY_PAR,&tlphy_par);
>
> -			pr_info("TLAN: %s: Link active with ", dev->name);
> -			if (!(tlphy_par&  TLAN_PHY_AN_EN_STAT)) {
> -				pr_info("forced 10%sMbps %s-Duplex\n",
> -					tlphy_par&  TLAN_PHY_SPEED_100
> -					? "" : "0",
> -					tlphy_par&  TLAN_PHY_DUPLEX_FULL
> -					? "Full" : "Half");
> -			} else {
> -				pr_info("Autonegotiation enabled, at 10%sMbps %s-Duplex\n",
> -					tlphy_par&  TLAN_PHY_SPEED_100
> -					? "" : "0",
> -					tlphy_par&  TLAN_PHY_DUPLEX_FULL
> -					? "Full" : "half");
> -				pr_info("TLAN: Partner capability:");
> +			netdev_info(dev,
> +				    "Link active with %s %uMbps %s-Duplex\n",
> +				    !(tlphy_par&  TLAN_PHY_AN_EN_STAT)
> +				    ? "forced" : "Autonegotiation enabled,",
> +				    tlphy_par&  TLAN_PHY_SPEED_100
> +				    ? 100 : 10,
> +				    tlphy_par&  TLAN_PHY_DUPLEX_FULL
> +				    ? "Full" : "Half");
> +			if (tlphy_par&  TLAN_PHY_AN_EN_STAT) {
> +				netdev_info(dev, "Partner capability:");
>   				for (i = 5; i<  10; i++)
>   					if (partner&  (1<<i))
> -						printk(" %s", media[i-5]);
> -				printk("\n");
> +						pr_cont(" %s", media[i-5]);
> +				pr_cont("\n");

I think these prints could be removed by a separate patch before this 
one. Would you like to do that, or shall I? :-)

No further comments on this one. Thanks.

Cheers,

-- 
Sakari Ailus
sakari.ailus@iki.fi

^ permalink raw reply

* Re: [regression] 2.6.37+ commit 0363466866d9.... breaks tcp ipv6
From: Hans de Bruin @ 2011-01-21 20:38 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Jesse Gross, netdev
In-Reply-To: <1295639745.2609.29.camel@edumazet-laptop>

On 01/21/2011 08:55 PM, Eric Dumazet wrote:
> Le vendredi 21 janvier 2011 à 20:47 +0100, Hans de Bruin a écrit :
>> On 01/18/2011 11:03 PM, Eric Dumazet wrote:
>>> Le mardi 18 janvier 2011 à 22:42 +0100, Hans de Bruin a écrit :
>>>> On 01/18/2011 09:06 PM, Jesse Gross wrote:
>>
>> ...
>>
>>> You could try "tcpdump -i eth0 ip6 -v"
>>>
>>> I guess you receive frames with bad checksums
>>
>> While you where staring at the code, I was fooling around with tcpdump.
>> And while the problem is fixed, I still have some questions:
>>
>> Is there tool which shows whether a nic supports ipv6 checksum offload
>> or not?
>>
>> I have captured http traffic (wget http://bootes/) between psion (my git
>> tree following laptop) and bootes (something running 2.6.33.7).
>> Attached is a capture with psion running 2.6.37 and one with this
>> morning's git tree. Wat's with the 'chsum ... ( incorrect ->  ' lines ?
>> ifconfig does not show errors on either of the machines.
>>
>
> tcpdump gets a copy of outgoing frames before NIC performs tx checksum
> (if tx checksum handled by NIC), so it's normal to have "bad checksums"
> on TX, unless you disable tx offloading (ethtool -K eth0 tx off)

That seem reasonable but: the bug was triggered because my nic could not 
offload checksumming, so what's tx=on if there's no support for it? I 
have turned tx off and my tcpdump still shows bad checksums on outgoing 
tcp/ip6 packets. I have tried 2.6.36: bad checksums, 2.6.35 and 
surprise: good checksums with tx=on.

>
> I was referring to check with tcpdump incoming frames, because invalid
> checksums in RX is sign that other peer sent wrong checksums

Ok, thats clear, the receiving site is apparently a more reliable 
checksummer than the sending site.

-- 
Hans

^ permalink raw reply

* Re: [regression] 2.6.37+ commit 0363466866d9.... breaks tcp ipv6
From: Eric Dumazet @ 2011-01-21 19:55 UTC (permalink / raw)
  To: Hans de Bruin; +Cc: Jesse Gross, netdev
In-Reply-To: <4D39E2EC.9020906@xmsnet.nl>

Le vendredi 21 janvier 2011 à 20:47 +0100, Hans de Bruin a écrit :
> On 01/18/2011 11:03 PM, Eric Dumazet wrote:
> > Le mardi 18 janvier 2011 à 22:42 +0100, Hans de Bruin a écrit :
> >> On 01/18/2011 09:06 PM, Jesse Gross wrote:
> 
> ...
> 
> > You could try "tcpdump -i eth0 ip6 -v"
> >
> > I guess you receive frames with bad checksums
> 
> While you where staring at the code, I was fooling around with tcpdump. 
> And while the problem is fixed, I still have some questions:
> 
> Is there tool which shows whether a nic supports ipv6 checksum offload 
> or not?
> 
> I have captured http traffic (wget http://bootes/) between psion (my git 
> tree following laptop) and bootes (something running 2.6.33.7).
> Attached is a capture with psion running 2.6.37 and one with this 
> morning's git tree. Wat's with the 'chsum ... ( incorrect -> ' lines ?
> ifconfig does not show errors on either of the machines.
> 

tcpdump gets a copy of outgoing frames before NIC performs tx checksum
(if tx checksum handled by NIC), so it's normal to have "bad checksums"
on TX, unless you disable tx offloading (ethtool -K eth0 tx off)

I was referring to check with tcpdump incoming frames, because invalid
checksums in RX is sign that other peer sent wrong checksums




^ permalink raw reply

* Re: [regression] 2.6.37+ commit 0363466866d9.... breaks tcp ipv6
From: Hans de Bruin @ 2011-01-21 19:47 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Jesse Gross, netdev
In-Reply-To: <1295388238.8449.10.camel@edumazet-laptop>

[-- Attachment #1: Type: text/plain, Size: 815 bytes --]

On 01/18/2011 11:03 PM, Eric Dumazet wrote:
> Le mardi 18 janvier 2011 à 22:42 +0100, Hans de Bruin a écrit :
>> On 01/18/2011 09:06 PM, Jesse Gross wrote:

...

> You could try "tcpdump -i eth0 ip6 -v"
>
> I guess you receive frames with bad checksums

While you where staring at the code, I was fooling around with tcpdump. 
And while the problem is fixed, I still have some questions:

Is there tool which shows whether a nic supports ipv6 checksum offload 
or not?

I have captured http traffic (wget http://bootes/) between psion (my git 
tree following laptop) and bootes (something running 2.6.33.7).
Attached is a capture with psion running 2.6.37 and one with this 
morning's git tree. Wat's with the 'chsum ... ( incorrect -> ' lines ?
ifconfig does not show errors on either of the machines.

-- 
Hans

[-- Attachment #2: dump.bootes.37 --]
[-- Type: text/plain, Size: 3902 bytes --]

20:16:18.258004 IP6 (hlim 255, next-header ICMPv6 (58) payload length: 56) fe80::230:18ff:feae:75d8 > ff02::1: ICMP6, router advertisement, length 56
	hop limit 64, Flags [none], pref medium, router lifetime 30s, reachable time 0s, retrans time 0s[ndp opt]
20:16:18.288694 IP6 (hlim 255, next-header ICMPv6 (58) payload length: 32) psion.system > ff02::1:ff00:2: [icmp6 sum ok] ICMP6, neighbor solicitation, length 32, who has orion.system
	  source link-address option (1), length 8 (1): 00:1c:23:2d:73:87
20:16:19.028550 IP6 (hlim 255, next-header ICMPv6 (58) payload length: 32) psion.system > ff02::1:ff00:7: [icmp6 sum ok] ICMP6, neighbor solicitation, length 32, who has bootes.system
	  source link-address option (1), length 8 (1): 00:1c:23:2d:73:87
20:16:19.028665 IP6 (hlim 255, next-header ICMPv6 (58) payload length: 32) bootes.system > psion.system: [icmp6 sum ok] ICMP6, neighbor advertisement, length 32, tgt is bootes.system, Flags [router, solicited, override]
	  destination link-address option (2), length 8 (1): 00:30:18:a8:89:7a
20:16:19.029000 IP6 (hlim 64, next-header TCP (6) payload length: 40) psion.system.56503 > bootes.system.http: Flags [S], seq 4077389508, win 4320, options [mss 1440,sackOK,TS val 131831 ecr 0,[|tcp]>
20:16:19.029122 IP6 (hlim 64, next-header TCP (6) payload length: 40) bootes.system.http > psion.system.56503: Flags [S.], seq 2926278613, ack 4077389509, win 5712, options [mss 1440,sackOK,TS val 1831852795 ecr 131831,[|tcp]>
20:16:19.029780 IP6 (hlim 64, next-header TCP (6) payload length: 32) psion.system.56503 > bootes.system.http: Flags [.], cksum 0x3b67 (correct), ack 1, win 68, options [nop,nop,TS val 131832 ecr 1831852795], length 0
20:16:19.029821 IP6 (hlim 64, next-header TCP (6) payload length: 136) psion.system.56503 > bootes.system.http: Flags [P.], ack 1, win 68, options [nop,nop,TS val 131832 ecr 1831852795], length 104
20:16:19.029901 IP6 (hlim 64, next-header TCP (6) payload length: 32) bootes.system.http > psion.system.56503: Flags [.], cksum 0x3b15 (correct), ack 105, win 45, options [nop,nop,TS val 1831852796 ecr 131832], length 0
20:16:19.041170 IP6 (hlim 64, next-header TCP (6) payload length: 1460) bootes.system.http > psion.system.56503: Flags [.], ack 105, win 45, options [nop,nop,TS val 1831852808 ecr 131832], length 1428
20:16:19.041238 IP6 (hlim 64, next-header TCP (6) payload length: 679) bootes.system.http > psion.system.56503: Flags [P.], ack 105, win 45, options [nop,nop,TS val 1831852808 ecr 131832], length 647
20:16:19.042492 IP6 (hlim 64, next-header TCP (6) payload length: 32) psion.system.56503 > bootes.system.http: Flags [.], cksum 0x352e (correct), ack 1429, win 113, options [nop,nop,TS val 131835 ecr 1831852808], length 0
20:16:19.042564 IP6 (hlim 64, next-header TCP (6) payload length: 32) psion.system.56503 > bootes.system.http: Flags [.], cksum 0x327b (correct), ack 2076, win 157, options [nop,nop,TS val 131835 ecr 1831852808], length 0
20:16:19.051735 IP6 (hlim 64, next-header TCP (6) payload length: 32) psion.system.56503 > bootes.system.http: Flags [F.], cksum 0x3278 (correct), seq 105, ack 2076, win 157, options [nop,nop,TS val 131837 ecr 1831852808], length 0
20:16:19.051882 IP6 (hlim 64, next-header TCP (6) payload length: 32) bootes.system.http > psion.system.56503: Flags [F.], cksum 0x32dd (correct), seq 2076, ack 106, win 45, options [nop,nop,TS val 1831852818 ecr 131837], length 0
20:16:19.052234 IP6 (hlim 64, next-header TCP (6) payload length: 32) psion.system.56503 > bootes.system.http: Flags [.], cksum 0x326d (correct), ack 2077, win 157, options [nop,nop,TS val 131837 ecr 1831852818], length 0
20:16:22.668566 IP6 (hlim 255, next-header ICMPv6 (58) payload length: 56) fe80::230:18ff:feae:75d8 > ff02::1: ICMP6, router advertisement, length 56
	hop limit 64, Flags [none], pref medium, router lifetime 30s, reachable time 0s, retrans time 0s[ndp opt]


[-- Attachment #3: dump.bootes.git --]
[-- Type: text/plain, Size: 2901 bytes --]

20:22:35.329370 IP6 (hlim 255, next-header ICMPv6 (58) payload length: 56) fe80::230:18ff:feae:75d8 > ff02::1: ICMP6, router advertisement, length 56
	hop limit 64, Flags [none], pref medium, router lifetime 30s, reachable time 0s, retrans time 0s[ndp opt]
20:22:37.760934 IP6 (hlim 64, next-header TCP (6) payload length: 40) psion.system.56995 > bootes.system.http: Flags [S], seq 531795977, win 14400, options [mss 1440,sackOK,TS val 4294944543 ecr 0,[|tcp]>
20:22:37.761054 IP6 (hlim 64, next-header TCP (6) payload length: 40) bootes.system.http > psion.system.56995: Flags [S.], seq 3711662312, ack 531795978, win 5712, options [mss 1440,sackOK,TS val 1832231527 ecr 4294944543,[|tcp]>
20:22:37.761441 IP6 (hlim 64, next-header TCP (6) payload length: 32) psion.system.56995 > bootes.system.http: Flags [.], cksum 0xe774 (correct), ack 1, win 225, options [nop,nop,TS val 4294944543 ecr 1832231527], length 0
20:22:37.761871 IP6 (hlim 64, next-header TCP (6) payload length: 136) psion.system.56995 > bootes.system.http: Flags [P.], ack 1, win 225, options [nop,nop,TS val 4294944543 ecr 1832231527], length 104
20:22:37.761958 IP6 (hlim 64, next-header TCP (6) payload length: 32) bootes.system.http > psion.system.56995: Flags [.], cksum 0xe7bf (correct), ack 105, win 45, options [nop,nop,TS val 1832231528 ecr 4294944543], length 0
20:22:37.767415 IP6 (hlim 64, next-header TCP (6) payload length: 1460) bootes.system.http > psion.system.56995: Flags [.], ack 105, win 45, options [nop,nop,TS val 1832231534 ecr 4294944543], length 1428
20:22:37.767476 IP6 (hlim 64, next-header TCP (6) payload length: 679) bootes.system.http > psion.system.56995: Flags [P.], ack 105, win 45, options [nop,nop,TS val 1832231534 ecr 4294944543], length 647
20:22:37.768236 IP6 (hlim 64, next-header TCP (6) payload length: 32) psion.system.56995 > bootes.system.http: Flags [.], cksum 0xe142 (correct), ack 1429, win 270, options [nop,nop,TS val 4294944545 ecr 1832231534], length 0
20:22:37.768293 IP6 (hlim 64, next-header TCP (6) payload length: 32) psion.system.56995 > bootes.system.http: Flags [.], cksum 0xde8e (correct), ack 2076, win 315, options [nop,nop,TS val 4294944545 ecr 1832231534], length 0
20:22:37.781697 IP6 (hlim 64, next-header TCP (6) payload length: 32) psion.system.56995 > bootes.system.http: Flags [F.], cksum 0xde8a (correct), seq 105, ack 2076, win 315, options [nop,nop,TS val 4294944548 ecr 1832231534], length 0
20:22:37.781854 IP6 (hlim 64, next-header TCP (6) payload length: 32) bootes.system.http > psion.system.56995: Flags [F.], cksum 0xdf89 (correct), seq 2076, ack 106, win 45, options [nop,nop,TS val 1832231548 ecr 4294944548], length 0
20:22:37.782421 IP6 (hlim 64, next-header TCP (6) payload length: 32) psion.system.56995 > bootes.system.http: Flags [.], cksum 0xde7b (correct), ack 2077, win 315, options [nop,nop,TS val 4294944548 ecr 1832231548], length 0


[-- Attachment #4: dump.psion.37 --]
[-- Type: text/plain, Size: 6361 bytes --]

20:17:22.849151 IP6 (hlim 255, next-header ICMPv6 (58) payload length: 56) fe80::230:18ff:feae:75d8 > ff02::1: ICMP6, router advertisement, length 56
	hop limit 64, Flags [none], pref medium, router lifetime 30s, reachable time 0s, retrans time 0s[ndp opt]
20:17:22.879242 IP6 (hlim 255, next-header ICMPv6 (58) payload length: 32) psion.system > ff02::1:ff00:2: [icmp6 sum ok] ICMP6, neighbor solicitation, length 32, who has orion.system
	  source link-address option (1), length 8 (1): 00:1c:23:2d:73:87
20:17:22.880017 IP6 (hlim 255, next-header ICMPv6 (58) payload length: 32) orion.system > psion.system: [icmp6 sum ok] ICMP6, neighbor advertisement, length 32, tgt is orion.system, Flags [solicited, override]
	  destination link-address option (2), length 8 (1): 00:16:3e:00:00:02
20:17:22.880053 IP6 (hlim 64, next-header UDP (17) payload length: 98) psion.system.47203 > orion.system.domain: 41124+[|domain]
20:17:22.881325 IP6 (hlim 64, next-header UDP (17) payload length: 159) orion.system.domain > psion.system.47203: 41124 NXDomain[|domain]
20:17:22.883634 IP6 (hlim 64, next-header UDP (17) payload length: 98) psion.system.34461 > orion.system.domain: 63069+[|domain]
20:17:22.884987 IP6 (hlim 64, next-header UDP (17) payload length: 133) orion.system.domain > psion.system.34461: 63069 NXDomain*[|domain]
20:17:22.887354 IP6 (hlim 64, next-header UDP (17) payload length: 98) psion.system.46516 > orion.system.domain: 55721+[|domain]
20:17:22.888798 IP6 (hlim 64, next-header UDP (17) payload length: 159) orion.system.domain > psion.system.46516: 55721 NXDomain[|domain]
20:17:22.890306 IP6 (hlim 64, next-header UDP (17) payload length: 98) psion.system.46047 > orion.system.domain: 17153+[|domain]
20:17:22.891585 IP6 (hlim 64, next-header UDP (17) payload length: 188) orion.system.domain > psion.system.46047: 17153*[|domain]
20:17:22.893211 IP6 (hlim 64, next-header UDP (17) payload length: 98) psion.system.43776 > orion.system.domain: 1636+[|domain]
20:17:22.894542 IP6 (hlim 64, next-header UDP (17) payload length: 182) orion.system.domain > psion.system.43776: 1636*[|domain]
20:17:23.615864 IP6 (hlim 64, next-header UDP (17) payload length: 39) psion.system.43964 > orion.system.domain: 55214+[|domain]
20:17:23.616979 IP6 (hlim 64, next-header UDP (17) payload length: 97) orion.system.domain > psion.system.43964: 55214* 1/1/1 [|domain]
20:17:23.617211 IP6 (hlim 64, next-header UDP (17) payload length: 39) psion.system.47968 > orion.system.domain: 41048+[|domain]
20:17:23.618338 IP6 (hlim 64, next-header UDP (17) payload length: 97) orion.system.domain > psion.system.47968: 41048* 1/1/1 [|domain]
20:17:23.619254 IP6 (hlim 255, next-header ICMPv6 (58) payload length: 32) psion.system > ff02::1:ff00:7: [icmp6 sum ok] ICMP6, neighbor solicitation, length 32, who has bootes.system
	  source link-address option (1), length 8 (1): 00:1c:23:2d:73:87
20:17:23.619739 IP6 (hlim 255, next-header ICMPv6 (58) payload length: 32) bootes.system > psion.system: [icmp6 sum ok] ICMP6, neighbor advertisement, length 32, tgt is bootes.system, Flags [router, solicited, override]
	  destination link-address option (2), length 8 (1): 00:30:18:a8:89:7a
20:17:23.619758 IP6 (hlim 64, next-header TCP (6) payload length: 40) psion.system.56503 > bootes.system.http: Flags [S], seq 4077389508, win 4320, options [mss 1440,sackOK,TS val 131831 ecr 0,[|tcp]>
20:17:23.620289 IP6 (hlim 64, next-header TCP (6) payload length: 40) bootes.system.http > psion.system.56503: Flags [S.], seq 2926278613, ack 4077389509, win 5712, options [mss 1440,sackOK,TS val 1831852795 ecr 131831,[|tcp]>
20:17:23.620325 IP6 (hlim 64, next-header TCP (6) payload length: 32) psion.system.56503 > bootes.system.http: Flags [.], cksum 0xf2fb (incorrect -> 0x3b67), ack 1, win 68, options [nop,nop,TS val 131832 ecr 1831852795], length 0
20:17:23.620423 IP6 (hlim 64, next-header TCP (6) payload length: 136) psion.system.56503 > bootes.system.http: Flags [P.], ack 1, win 68, options [nop,nop,TS val 131832 ecr 1831852795], length 104
20:17:23.621055 IP6 (hlim 64, next-header TCP (6) payload length: 32) bootes.system.http > psion.system.56503: Flags [.], cksum 0x3b15 (correct), ack 105, win 45, options [nop,nop,TS val 1831852796 ecr 131832], length 0
20:17:23.621328 IP6 (hlim 64, next-header UDP (17) payload length: 98) psion.system.44443 > orion.system.domain: 48669+[|domain]
20:17:23.622786 IP6 (hlim 64, next-header UDP (17) payload length: 159) orion.system.domain > psion.system.44443: 48669 NXDomain[|domain]
20:17:23.624060 IP6 (hlim 64, next-header UDP (17) payload length: 98) psion.system.41904 > orion.system.domain: 13668+[|domain]
20:17:23.625741 IP6 (hlim 64, next-header UDP (17) payload length: 189) orion.system.domain > psion.system.41904: 13668*[|domain]
20:17:23.633020 IP6 (hlim 64, next-header TCP (6) payload length: 1460) bootes.system.http > psion.system.56503: Flags [.], ack 105, win 45, options [nop,nop,TS val 1831852808 ecr 131832], length 1428
20:17:23.633076 IP6 (hlim 64, next-header TCP (6) payload length: 32) psion.system.56503 > bootes.system.http: Flags [.], cksum 0xf2fb (incorrect -> 0x352e), ack 1429, win 113, options [nop,nop,TS val 131835 ecr 1831852808], length 0
20:17:23.633090 IP6 (hlim 64, next-header TCP (6) payload length: 679) bootes.system.http > psion.system.56503: Flags [P.], ack 105, win 45, options [nop,nop,TS val 1831852808 ecr 131832], length 647
20:17:23.633107 IP6 (hlim 64, next-header TCP (6) payload length: 32) psion.system.56503 > bootes.system.http: Flags [.], cksum 0xf2fb (incorrect -> 0x327b), ack 2076, win 157, options [nop,nop,TS val 131835 ecr 1831852808], length 0
20:17:23.642284 IP6 (hlim 64, next-header TCP (6) payload length: 32) psion.system.56503 > bootes.system.http: Flags [F.], cksum 0xf2fb (incorrect -> 0x3278), seq 105, ack 2076, win 157, options [nop,nop,TS val 131837 ecr 1831852808], length 0
20:17:23.642977 IP6 (hlim 64, next-header TCP (6) payload length: 32) bootes.system.http > psion.system.56503: Flags [F.], cksum 0x32dd (correct), seq 2076, ack 106, win 45, options [nop,nop,TS val 1831852818 ecr 131837], length 0
20:17:23.643018 IP6 (hlim 64, next-header TCP (6) payload length: 32) psion.system.56503 > bootes.system.http: Flags [.], cksum 0xf2fb (incorrect -> 0x326d), ack 2077, win 157, options [nop,nop,TS val 131837 ecr 1831852818], length 0


[-- Attachment #5: dump.psion.git --]
[-- Type: text/plain, Size: 4007 bytes --]

20:23:42.349445 IP6 (hlim 64, next-header UDP (17) payload length: 39) psion.system.56211 > orion.system.domain: 47986+[|domain]
20:23:42.350901 IP6 (hlim 64, next-header UDP (17) payload length: 97) orion.system.domain > psion.system.56211: 47986* 1/1/1 [|domain]
20:23:42.351249 IP6 (hlim 64, next-header UDP (17) payload length: 39) psion.system.58999 > orion.system.domain: 20696+[|domain]
20:23:42.352112 IP6 (hlim 64, next-header UDP (17) payload length: 97) orion.system.domain > psion.system.58999: 20696* 1/1/1 [|domain]
20:23:42.352524 IP6 (hlim 64, next-header TCP (6) payload length: 40) psion.system.56995 > bootes.system.http: Flags [S], seq 531795977, win 14400, options [mss 1440,sackOK,TS val 4294944543 ecr 0,[|tcp]>
20:23:42.352924 IP6 (hlim 64, next-header TCP (6) payload length: 40) bootes.system.http > psion.system.56995: Flags [S.], seq 3711662312, ack 531795978, win 5712, options [mss 1440,sackOK,TS val 1832231527 ecr 4294944543,[|tcp]>
20:23:42.352955 IP6 (hlim 64, next-header TCP (6) payload length: 32) psion.system.56995 > bootes.system.http: Flags [.], cksum 0xf2fb (incorrect -> 0xe774), ack 1, win 225, options [nop,nop,TS val 4294944543 ecr 1832231527], length 0
20:23:42.353263 IP6 (hlim 64, next-header TCP (6) payload length: 136) psion.system.56995 > bootes.system.http: Flags [P.], ack 1, win 225, options [nop,nop,TS val 4294944543 ecr 1832231527], length 104
20:23:42.354069 IP6 (hlim 64, next-header TCP (6) payload length: 32) bootes.system.http > psion.system.56995: Flags [.], cksum 0xe7bf (correct), ack 105, win 45, options [nop,nop,TS val 1832231528 ecr 4294944543], length 0
20:23:42.359591 IP6 (hlim 64, next-header TCP (6) payload length: 1460) bootes.system.http > psion.system.56995: Flags [.], ack 105, win 45, options [nop,nop,TS val 1832231534 ecr 4294944543], length 1428
20:23:42.359642 IP6 (hlim 64, next-header TCP (6) payload length: 32) psion.system.56995 > bootes.system.http: Flags [.], cksum 0xf2fb (incorrect -> 0xe142), ack 1429, win 270, options [nop,nop,TS val 4294944545 ecr 1832231534], length 0
20:23:42.359668 IP6 (hlim 64, next-header TCP (6) payload length: 679) bootes.system.http > psion.system.56995: Flags [P.], ack 105, win 45, options [nop,nop,TS val 1832231534 ecr 4294944543], length 647
20:23:42.359685 IP6 (hlim 64, next-header TCP (6) payload length: 32) psion.system.56995 > bootes.system.http: Flags [.], cksum 0xf2fb (incorrect -> 0xde8e), ack 2076, win 315, options [nop,nop,TS val 4294944545 ecr 1832231534], length 0
20:23:42.372455 IP6 (hlim 64, next-header UDP (17) payload length: 98) psion.system.60367 > orion.system.domain: 29336+[|domain]
20:23:42.373200 IP6 (hlim 64, next-header TCP (6) payload length: 32) psion.system.56995 > bootes.system.http: Flags [F.], cksum 0xf2fb (incorrect -> 0xde8a), seq 105, ack 2076, win 315, options [nop,nop,TS val 4294944548 ecr 1832231534], length 0
20:23:42.373771 IP6 (hlim 64, next-header UDP (17) payload length: 182) orion.system.domain > psion.system.60367: 29336*[|domain]
20:23:42.373793 IP6 (hlim 64, next-header TCP (6) payload length: 32) bootes.system.http > psion.system.56995: Flags [F.], cksum 0xdf89 (correct), seq 2076, ack 106, win 45, options [nop,nop,TS val 1832231548 ecr 4294944548], length 0
20:23:42.373821 IP6 (hlim 64, next-header TCP (6) payload length: 32) psion.system.56995 > bootes.system.http: Flags [.], cksum 0xf2fb (incorrect -> 0xde7b), ack 2077, win 315, options [nop,nop,TS val 4294944548 ecr 1832231548], length 0
20:23:42.378616 IP6 (hlim 64, next-header UDP (17) payload length: 98) psion.system.52144 > orion.system.domain: 9105+[|domain]
20:23:42.379782 IP6 (hlim 64, next-header UDP (17) payload length: 188) orion.system.domain > psion.system.52144: 9105*[|domain]
20:23:42.380951 IP6 (hlim 64, next-header UDP (17) payload length: 98) psion.system.48731 > orion.system.domain: 3203+[|domain]
20:23:42.382073 IP6 (hlim 64, next-header UDP (17) payload length: 189) orion.system.domain > psion.system.48731: 3203*[|domain]


^ permalink raw reply

* [PATCH] rtlwifi: Fix possible NULL dereference
From: Larry Finger @ 2011-01-21 19:40 UTC (permalink / raw)
  To: John W Linville; +Cc: chaoming_li, linux-kernel, linux-wireless, netdev

From: Jesper Juhl <jj@chaosbits.net>

In drivers/net/wireless/rtlwifi/pci.c::_rtl_pci_rx_interrupt() we call 
dev_alloc_skb(), which may fail and return NULL, but we do not check the 
returned value against NULL before dereferencing the returned pointer. 
This may lead to a NULL pointer dereference which means we'll crash - not 
good.

In a separate call to dev_alloc_skb(), the debug level is changed so that
the failure message will always be logged.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
---

John,

Material for 2.6.38.

Larry

Index: wireless-testing/drivers/net/wireless/rtlwifi/pci.c
===================================================================
--- wireless-testing.orig/drivers/net/wireless/rtlwifi/pci.c
+++ wireless-testing/drivers/net/wireless/rtlwifi/pci.c
@@ -619,6 +619,13 @@ static void _rtl_pci_rx_interrupt(struct
 					struct sk_buff *uskb = NULL;
 					u8 *pdata;
 					uskb = dev_alloc_skb(skb->len + 128);
+					if (!uskb) {
+						RT_TRACE(rtlpriv,
+							(COMP_INTR | COMP_RECV),
+							DBG_EMERG,
+							("can't alloc rx skb\n"));
+						goto done;
+					}
 					memcpy(IEEE80211_SKB_RXCB(uskb),
 							&rx_status,
 							sizeof(rx_status));
@@ -641,7 +648,7 @@ static void _rtl_pci_rx_interrupt(struct
 			new_skb = dev_alloc_skb(rtlpci->rxbuffersize);
 			if (unlikely(!new_skb)) {
 				RT_TRACE(rtlpriv, (COMP_INTR | COMP_RECV),
-					 DBG_DMESG,
+					 DBG_EMERG,
 					 ("can't alloc skb for rx\n"));
 				goto done;
 			}
@@ -1066,9 +1073,9 @@ static int _rtl_pci_init_rx_ring(struct
 			struct sk_buff *skb =
 			    dev_alloc_skb(rtlpci->rxbuffersize);
 			u32 bufferaddress;
-			entry = &rtlpci->rx_ring[rx_queue_idx].desc[i];
 			if (!skb)
 				return 0;
+			entry = &rtlpci->rx_ring[rx_queue_idx].desc[i];
 
 			/*skb->dev = dev; */
 

^ permalink raw reply

* Re: [PATCH 1/4] ppp: Clean up kernel log messages.
From: Joe Perches @ 2011-01-21 19:30 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, paulus
In-Reply-To: <20110120.235601.139098188.davem@davemloft.net>

On Thu, 2011-01-20 at 23:56 -0800, David Miller wrote:
> Use netdev_*() and pr_*().

Perhaps it's better to standardize on "PPP: " or "ppp: "
for these outputs.

Maybe use pr_fmt(fmt) "ppp: " fmt or add a function like:

void ppp_printk(const char *level, const struct ppp* ppp, const char *fmt, ...)
{
	struct va_list args;
	struct va_format vaf;

	va_start(args, fmt);

	vaf.fmt = fmt;
	vaf.va = &args;
	if (ppp && ppp->dev)
		netdev_printk(level, ppp->dev, "ppp: %pV", &vaf);
	else
		printk("%sppp: %pV", level, &vaf);

	va_end(va);
}
and
#define ppp_err(ppp, fmt, ...) ppp_printk(KERN_ERR, ppp, fmt, ##__VA_ARGS__)
etc...

> diff --git a/drivers/net/ppp_generic.c b/drivers/net/ppp_generic.c
> -			printk(KERN_DEBUG "PPPIOCDETACH file->f_count=%ld\n",
> -			       atomic_long_read(&file->f_count));
> +			pr_warn("PPPIOCDETACH file->f_count=%ld\n",
[]
> -		printk(KERN_ERR "PPP: not interface or channel??\n");
> +		pr_err("PPP: not interface or channel??\n");
[]
>  		if (net_ratelimit())
> -			printk(KERN_ERR "ppp: compressor dropped pkt\n");
> +			netdev_err(ppp->dev, "ppp: compressor dropped pkt\n");
[]
> -				printk(KERN_DEBUG "PPP: outbound frame not passed\n");
> +				netdev_printk(KERN_DEBUG, ppp->dev,
> +					      "PPP: outbound frame "
> +					      "not passed\n");
[]
> -				printk(KERN_ERR "ppp: compression required but down - pkt dropped.\n");
> +				netdev_err(ppp->dev,
> +					   "ppp: compression required but "
> +					   "down - pkt dropped.\n");

etc.


^ permalink raw reply

* Re: 2.6.38-rc1: arp triggers RTNL assertion
From: Eric Dumazet @ 2011-01-21 18:52 UTC (permalink / raw)
  To: Jamie Heilman, David Miller; +Cc: linux-kernel, netdev
In-Reply-To: <1295593946.2613.52.camel@edumazet-laptop>

Le vendredi 21 janvier 2011 à 08:12 +0100, Eric Dumazet a écrit :
> Le jeudi 20 janvier 2011 à 22:17 -0800, Jamie Heilman a écrit :
> > With 2.6.38-rc1 when I run: arp -Ds 192.168.2.41 eth0 pub
> > I see:
> > 
> > RTNL: assertion failed at net/core/neighbour.c (589)
> > Pid: 2330, comm: arp Not tainted 2.6.38-rc1-00132-g8d99641-dirty #1
> > Call Trace:
> >  [<c11ed339>] ? pneigh_lookup+0xc3/0x168
> >  [<c1219f27>] ? arp_req_set+0x86/0x1d5
> >  [<c11e74b5>] ? dev_get_by_name_rcu+0x72/0x7f
> >  [<c121a1a3>] ? arp_ioctl+0x12d/0x22e
> >  [<c121dfeb>] ? inet_ioctl+0x82/0xa7
> >  [<c11d8ffc>] ? sock_ioctl+0x1b7/0x1db
> >  [<c11d8e45>] ? sock_ioctl+0x0/0x1db
> >  [<c108f02f>] ? do_vfs_ioctl+0x47c/0x4c5
> >  [<c101803c>] ? do_page_fault+0x315/0x341
> >  [<c11daaf3>] ? sys_socket+0x44/0x5a
> >  [<c11dab71>] ? sys_socketcall+0x68/0x270
> >  [<c108f0ab>] ? sys_ioctl+0x33/0x4b
> >  [<c1002897>] ? sysenter_do_call+0x12/0x26
> > 
> > Figured I'd Cc Eric as this could be related to commit 941666c2,
> > "net: RCU conversion of dev_getbyhwaddr() and arp_ioctl()"
> > 
> > Config attached, just in case (the uncommited change, in the tree this
> > kernel was built from, is just Chuck Lever's recent nfs3xdr.c patch).
> 
> Thanks for the report, I am looking at this right now.
> 
> 

Here is how I fixed this, thanks again Jamie !

[PATCH] net: neighbour: pneigh_lookup() doesnt need RTNL

Commit 941666c2 "net: RCU conversion of dev_getbyhwaddr() and
arp_ioctl()" introduced a regression, reported by Jamie Heilman.
"arp -Ds 192.168.2.41 eth0 pub" triggered the ASSERT_RTNL() assert.

Relax pneigh_lookup() to not require RTNL being held, using the tbl
rwlock, in read or write mode for the whole function duration.

Reported-by: Jamie Heilman <jamie@audible.transient.net>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/core/neighbour.c |   17 ++++++++++-------
 1 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 799f06e..6b96b2c 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -578,17 +578,18 @@ struct pneigh_entry * pneigh_lookup(struct neigh_table *tbl,
 	int key_len = tbl->key_len;
 	u32 hash_val = pneigh_hash(pkey, key_len);
 
-	read_lock_bh(&tbl->lock);
+	if (creat)
+		write_lock_bh(&tbl->lock);
+	else
+		read_lock_bh(&tbl->lock);
+
 	n = __pneigh_lookup_1(tbl->phash_buckets[hash_val],
 			      net, pkey, key_len, dev);
-	read_unlock_bh(&tbl->lock);
 
 	if (n || !creat)
 		goto out;
 
-	ASSERT_RTNL();
-
-	n = kmalloc(sizeof(*n) + key_len, GFP_KERNEL);
+	n = kmalloc(sizeof(*n) + key_len, GFP_ATOMIC);
 	if (!n)
 		goto out;
 
@@ -607,11 +608,13 @@ struct pneigh_entry * pneigh_lookup(struct neigh_table *tbl,
 		goto out;
 	}
 
-	write_lock_bh(&tbl->lock);
 	n->next = tbl->phash_buckets[hash_val];
 	tbl->phash_buckets[hash_val] = n;
-	write_unlock_bh(&tbl->lock);
 out:
+	if (creat)
+		write_unlock_bh(&tbl->lock);
+	else
+		read_unlock_bh(&tbl->lock);
 	return n;
 }
 EXPORT_SYMBOL(pneigh_lookup);

^ permalink raw reply related

* Re: [PATCH v2] mac80211:  Optimize scans on current operating channel.
From: Ben Greear @ 2011-01-21 18:06 UTC (permalink / raw)
  To: netdev
In-Reply-To: <1295632988-7227-1-git-send-email-greearb@candelatech.com>

On 01/21/2011 10:03 AM, greearb@candelatech.com wrote:
> From: Ben Greear<greearb@candelatech.com>
>
> This should decrease un-necessary flushes, on/off channel work,
> and channel changes in cases where the only scanned channel is
> the current operating channel.

Sorry..meant to send this to linux-wireless....

Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply

* Re: Flow Control and Port Mirroring Revisited
From: Rick Jones @ 2011-01-21 18:04 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Simon Horman, Jesse Gross, Rusty Russell, virtualization, dev,
	virtualization, netdev, kvm
In-Reply-To: <20110121095929.GE26070@redhat.com>

>>I have constructed a test where I run an un-paced  UDP_STREAM test in
>>one guest and a paced omni rr test in another guest at the same time.
> 
> 
> Hmm, what is this supposed to measure?  Basically each time you run an
> un-paced UDP_STREAM you get some random load on the network.

Well, if the netperf is (effectively) pinned to a given CPU, presumably it would 
be trying to generate UDP datagrams at the same rate each time.  Indeed though, 
no guarantee that rate would consistently get through each time.

But then, that is where one can use the confidence intervals options to get an 
idea by how much the rate varied.

rick jones

^ permalink raw reply

* [PATCH v2] mac80211:  Optimize scans on current operating channel.
From: greearb @ 2011-01-21 18:03 UTC (permalink / raw)
  To: netdev; +Cc: Ben Greear

From: Ben Greear <greearb@candelatech.com>

This should decrease un-necessary flushes, on/off channel work,
and channel changes in cases where the only scanned channel is
the current operating channel.

Signed-off-by: Ben Greear <greearb@candelatech.com>
---

v2:  Check channels instead of flag when determining if we should
  do a channel change in scan_completed_finish.

:100644 100644 c47d7c0... 59fe5e7... M	net/mac80211/ieee80211_i.h
:100644 100644 1236710... e6de0e7... M	net/mac80211/rx.c
:100644 100644 3e660db... fa0aeb9... M	net/mac80211/scan.c
 net/mac80211/ieee80211_i.h |    5 +++++
 net/mac80211/rx.c          |   11 ++++++++---
 net/mac80211/scan.c        |   41 +++++++++++++++++++++++++----------------
 3 files changed, 38 insertions(+), 19 deletions(-)

diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
index c47d7c0..59fe5e7 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -660,6 +660,10 @@ struct tpt_led_trigger {
  *	that the scan completed.
  * @SCAN_ABORTED: Set for our scan work function when the driver reported
  *	a scan complete for an aborted scan.
+ * @SCAN_LEFT_OPER_CHANNEL:  Set this flag if the scan process leaves the
+ *      operating channel at any time.  If scanning ONLY the current operating
+ *      channel this flag should not be set, and this will allow fewer
+ *      offchannel changes.
  */
 enum {
 	SCAN_SW_SCANNING,
@@ -667,6 +671,7 @@ enum {
 	SCAN_OFF_CHANNEL,
 	SCAN_COMPLETED,
 	SCAN_ABORTED,
+	SCAN_LEFT_OPER_CHANNEL,
 };
 
 /**
diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c
index 1236710..e6de0e7 100644
--- a/net/mac80211/rx.c
+++ b/net/mac80211/rx.c
@@ -388,6 +388,7 @@ ieee80211_rx_h_passive_scan(struct ieee80211_rx_data *rx)
 	struct ieee80211_local *local = rx->local;
 	struct ieee80211_rx_status *status = IEEE80211_SKB_RXCB(rx->skb);
 	struct sk_buff *skb = rx->skb;
+	int ret;
 
 	if (likely(!(status->rx_flags & IEEE80211_RX_IN_SCAN)))
 		return RX_CONTINUE;
@@ -396,10 +397,14 @@ ieee80211_rx_h_passive_scan(struct ieee80211_rx_data *rx)
 		return ieee80211_scan_rx(rx->sdata, skb);
 
 	if (test_bit(SCAN_SW_SCANNING, &local->scanning)) {
-		/* drop all the other packets during a software scan anyway */
-		if (ieee80211_scan_rx(rx->sdata, skb) != RX_QUEUED)
+		ret = ieee80211_scan_rx(rx->sdata, skb);
+		/* drop all the other packets while scanning off channel */
+		if (ret != RX_QUEUED &&
+		    test_bit(SCAN_OFF_CHANNEL, &local->scanning)) {
 			dev_kfree_skb(skb);
-		return RX_QUEUED;
+			return RX_QUEUED;
+		}
+		return ret;
 	}
 
 	/* scanning finished during invoking of handlers */
diff --git a/net/mac80211/scan.c b/net/mac80211/scan.c
index 3e660db..fa0aeb9 100644
--- a/net/mac80211/scan.c
+++ b/net/mac80211/scan.c
@@ -293,11 +293,14 @@ static void __ieee80211_scan_completed_finish(struct ieee80211_hw *hw,
 {
 	struct ieee80211_local *local = hw_to_local(hw);
 
-	ieee80211_hw_config(local, IEEE80211_CONF_CHANGE_CHANNEL);
+	if ((local->oper_channel != local->hw.conf.channel) || was_hw_scan)
+		ieee80211_hw_config(local, IEEE80211_CONF_CHANGE_CHANNEL);
+
 	if (!was_hw_scan) {
 		ieee80211_configure_filter(local);
 		drv_sw_scan_complete(local);
-		ieee80211_offchannel_return(local, true);
+		if (test_bit(SCAN_LEFT_OPER_CHANNEL, &local->scanning))
+			ieee80211_offchannel_return(local, true);
 	}
 
 	mutex_lock(&local->mtx);
@@ -397,13 +400,10 @@ static int ieee80211_start_sw_scan(struct ieee80211_local *local)
 
 	drv_sw_scan_start(local);
 
-	ieee80211_offchannel_stop_beaconing(local);
-
 	local->leave_oper_channel_time = 0;
 	local->next_scan_state = SCAN_DECISION;
 	local->scan_channel_idx = 0;
-
-	drv_flush(local, false);
+	__clear_bit(SCAN_LEFT_OPER_CHANNEL, &local->scanning);
 
 	ieee80211_configure_filter(local);
 
@@ -543,7 +543,18 @@ static void ieee80211_scan_state_decision(struct ieee80211_local *local,
 	}
 	mutex_unlock(&local->iflist_mtx);
 
-	if (local->scan_channel) {
+	next_chan = local->scan_req->channels[local->scan_channel_idx];
+
+	if (local->oper_channel == local->hw.conf.channel) {
+		if (next_chan == local->oper_channel)
+			local->next_scan_state = SCAN_SET_CHANNEL;
+		else
+			/*
+			 * we're on the operating channel currently, let's
+			 * leave that channel now to scan another one
+			 */
+			local->next_scan_state = SCAN_LEAVE_OPER_CHANNEL;
+	} else {
 		/*
 		 * we're currently scanning a different channel, let's
 		 * see if we can scan another channel without interfering
@@ -559,7 +570,6 @@ static void ieee80211_scan_state_decision(struct ieee80211_local *local,
 		 *
 		 * Otherwise switch back to the operating channel.
 		 */
-		next_chan = local->scan_req->channels[local->scan_channel_idx];
 
 		bad_latency = time_after(jiffies +
 				ieee80211_scan_get_channel_time(next_chan),
@@ -577,12 +587,6 @@ static void ieee80211_scan_state_decision(struct ieee80211_local *local,
 			local->next_scan_state = SCAN_ENTER_OPER_CHANNEL;
 		else
 			local->next_scan_state = SCAN_SET_CHANNEL;
-	} else {
-		/*
-		 * we're on the operating channel currently, let's
-		 * leave that channel now to scan another one
-		 */
-		local->next_scan_state = SCAN_LEAVE_OPER_CHANNEL;
 	}
 
 	*next_delay = 0;
@@ -591,9 +595,12 @@ static void ieee80211_scan_state_decision(struct ieee80211_local *local,
 static void ieee80211_scan_state_leave_oper_channel(struct ieee80211_local *local,
 						    unsigned long *next_delay)
 {
+	ieee80211_offchannel_stop_beaconing(local);
+
 	ieee80211_offchannel_stop_station(local);
 
 	__set_bit(SCAN_OFF_CHANNEL, &local->scanning);
+	__set_bit(SCAN_LEFT_OPER_CHANNEL, &local->scanning);
 
 	/*
 	 * What if the nullfunc frames didn't arrive?
@@ -640,8 +647,10 @@ static void ieee80211_scan_state_set_channel(struct ieee80211_local *local,
 	chan = local->scan_req->channels[local->scan_channel_idx];
 
 	local->scan_channel = chan;
-	if (ieee80211_hw_config(local, IEEE80211_CONF_CHANGE_CHANNEL))
-		skip = 1;
+
+	if (chan != local->hw.conf.channel)
+		if (ieee80211_hw_config(local, IEEE80211_CONF_CHANGE_CHANNEL))
+			skip = 1;
 
 	/* advance state machine to next channel/band */
 	local->scan_channel_idx++;
-- 
1.7.2.3


^ permalink raw reply related

* Re: [PATCH net-next-2.6] net: netif_setup_tc() is static
From: John Fastabend @ 2011-01-21 17:54 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, bhutchings@solarflare.com, jarkao2@gmail.com,
	hadi@cyberus.ca, shemminger@vyatta.com, tgraf@infradead.org,
	nhorman@tuxdriver.com, netdev@vger.kernel.org
In-Reply-To: <1295587088.2613.51.camel@edumazet-laptop>

On 1/20/2011 9:18 PM, Eric Dumazet wrote:
> Le mercredi 19 janvier 2011 à 23:41 -0800, David Miller a écrit :
>> From: John Fastabend <john.r.fastabend@intel.com>
>> Date: Mon, 17 Jan 2011 10:06:04 -0800
>>
>>> This patch provides a mechanism for lower layer devices to
>>> steer traffic using skb->priority to tx queues.
>>  ...
>>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>>
>> Applied.
> 
> Hi John
> 
> Should netif_setup_tc() be static, or is it meant to be exported
> somehow ?
> 
> [PATCH net-next-2.6] net: netif_setup_tc() is static
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---

Acked-by: John Fastabend <john.r.fastabend@intel.com>

Yes this should be static. Thanks Eric!


^ permalink raw reply

* Re: [Patch] Kill off warning: ‘inline’ is not at beginning of declaration
From: Joel Becker @ 2011-01-21 17:31 UTC (permalink / raw)
  To: Jesper Juhl
  Cc: alsa-devel, Mauro Carvalho Chehab, Takashi Iwai,
	Frederic Weisbecker, Gustavo F. Padovan, Jaroslav Kysela,
	Jens Axboe, Stephen Hemminger, Andi Kleen, H. Peter Anvin,
	Pekka Savola (ipv6), Robert Richter, x86, James Morris,
	Ingo Molnar, oprofile-list, Alexey Kuznetsov, Mark Fasheh,
	Marcel Holtmann, John W. Linville, Thomas Gleixner, linux-edac, t
In-Reply-To: <alpine.LNX.2.00.1101170000270.13377@swampdragon.chaosbits.net>

On Mon, Jan 17, 2011 at 12:09:38AM +0100, Jesper Juhl wrote:
> Fix a bunch of 
> 	warning: ‘inline’ is not at beginning of declaration
> messages when building a 'make allyesconfig' kernel with -Wextra.
> 
> These warnings are trivial to kill, yet rather annoying when building with 
> -Wextra.
> The more we can cut down on pointless crap like this the better (IMHO).
> 
> A previous patch to do this for a 'allnoconfig' build has already been 
> merged. This just takes the cleanup a little further.
> 
> Signed-off-by: Jesper Juhl <jj@chaosbits.net>

Acked for fs/ocfs2

Joel

-- 

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply

* Re: [PATCH 5/8] af_unix: find the recipients of a multicast group
From: Alban Crequy @ 2011-01-21 17:24 UTC (permalink / raw)
  To: Alban Crequy
  Cc: David S. Miller, Eric Dumazet, Lennart Poettering, netdev,
	linux-kernel, Ian Molton
In-Reply-To: <1295620788-6002-5-git-send-email-alban.crequy@collabora.co.uk>

[drop Cc on linux-doc]

I've got a this message with my multicast patches:

[  109.314741] =================================
[  109.316007] [ INFO: inconsistent lock state ]
[  109.316007] 2.6.38-rc1+ #14
[  109.316007] ---------------------------------
[  109.316007] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[  109.316007] ksoftirqd/1/9 [HC0[0]:SC1[1]:HE0:SE0] takes:
[  109.316007]  (&af_unix_sk_receive_queue_lock_key){+.?...}, at: [<c1256028>] skb_dequeue+0x12/0x4a
[  109.316007] {SOFTIRQ-ON-W} state was registered at:
[  109.316007]   [<c105b9b9>] __lock_acquire+0x2df/0xb95
[  109.316007]   [<c105c334>] lock_acquire+0xc5/0xe6
[  109.316007]   [<c12fd21d>] _raw_spin_lock+0x33/0x40
[  109.316007]   [<e080cbc8>] unix_stream_connect+0x34f/0x3d5 [unix]
[  109.316007]   [<c1250918>] sys_connect+0x7c/0xb2
[  109.316007]   [<c125169e>] sys_socketcall+0xb0/0x289
[  109.316007]   [<c12fdb4c>] syscall_call+0x7/0xb
[  109.316007] irq event stamp: 463879
[  109.316007] hardirqs last  enabled at (463878): [<c10c8d3c>] kmem_cache_free+0xa4/0xe2
[  109.316007] hardirqs last disabled at (463879): [<c12fd2ed>] _raw_spin_lock_irqsave+0x1d/0x57
[  109.316007] softirqs last  enabled at (463638): [<c10385d9>] __do_softirq+0x17c/0x190
[  109.316007] softirqs last disabled at (463641): [<c1004bd3>] do_softirq+0x60/0xb9
[  109.316007] 
[  109.316007] other info that might help us debug this:
[  109.316007] no locks held by ksoftirqd/1/9.
[  109.316007] 
[  109.316007] stack backtrace:
[  109.316007] Pid: 9, comm: ksoftirqd/1 Not tainted 2.6.38-rc1+ #14
[  109.316007] Call Trace:
[  109.316007]  [<c105a70f>] ? valid_state+0x168/0x174
[  109.316007]  [<c105a803>] ? mark_lock+0xe8/0x1e8
[  109.316007]  [<c105aefb>] ? check_usage_forwards+0x0/0x77
[  109.316007]  [<c105b94b>] ? __lock_acquire+0x271/0xb95
[  109.316007]  [<c1059af3>] ? register_lock_class+0x17/0x2a4
[  109.316007]  [<c105a739>] ? mark_lock+0x1e/0x1e8
[  109.316007]  [<c1059787>] ? trace_hardirqs_off+0xb/0xd
[  109.316007]  [<c105ace5>] ? debug_check_no_locks_freed+0x115/0x12d
[  109.316007]  [<c1256028>] ? skb_dequeue+0x12/0x4a
[  109.316007]  [<c105c334>] ? lock_acquire+0xc5/0xe6
[  109.316007]  [<c1256028>] ? skb_dequeue+0x12/0x4a
[  109.316007]  [<c12fd317>] ? _raw_spin_lock_irqsave+0x47/0x57
[  109.316007]  [<c1256028>] ? skb_dequeue+0x12/0x4a
[  109.316007]  [<c1256028>] ? skb_dequeue+0x12/0x4a
[  109.316007]  [<c1256a75>] ? skb_queue_purge+0x14/0x1b
[  109.316007]  [<e080cc62>] ? unix_sock_destructor+0x14/0xb6 [unix]
[  109.316007]  [<c12532fe>] ? __sk_free+0x17/0x13f
[  109.316007]  [<c105ab89>] ? trace_hardirqs_on_caller+0xeb/0x125
[  109.316007]  [<c1253488>] ? sk_free+0x16/0x18
[  109.316007]  [<e0809f74>] ? sock_put+0x13/0x15 [unix]
[  109.316007]  [<e080a107>] ? kfree_sock_set+0x21/0x36 [unix]
[  109.316007]  [<e080a127>] ? sock_set_reclaim+0xb/0xd [unix]
[  109.316007]  [<c1080068>] ? __rcu_process_callbacks+0x176/0x26b
[  109.316007]  [<c108017b>] ? rcu_process_callbacks+0x1e/0x3b
[  109.316007]  [<c103850e>] ? __do_softirq+0xb1/0x190
[  109.316007]  [<c103845d>] ? __do_softirq+0x0/0x190
[  109.316007]  <IRQ>  [<c1037d27>] ? run_ksoftirqd+0x57/0xd3
[  109.316007]  [<c1037cd0>] ? run_ksoftirqd+0x0/0xd3
[  109.316007]  [<c104a930>] ? kthread+0x6d/0x72
[  109.316007]  [<c104a8c3>] ? kthread+0x0/0x72
[  109.316007]  [<c1003742>] ? kernel_thread_helper+0x6/0x10

The socket is released and skb is dequeued in a call_rcu() callback:

> +	/* Take the lock to insert the new list but take the opportunity to do
> +	 * some garbage collection on outdated lists */
> +	spin_lock(&unix_multicast_lock);
> +	hlist_for_each_entry_rcu(del_set, pos, &group->mcast_members_lists,
> +			     list) {
> +		if (down_trylock(&del_set->sem)) {
> +			/* the list is being used by someone else */
> +			continue;
> +		}
> +		if (del_set->generation < generation) {
> +			hlist_del_rcu(&del_set->list);
> +			call_rcu(&del_set->rcu, sock_set_reclaim);

The purpose of that chunk is to release outdated struct sock_set soon
enough instead of doing it in destroy_mcast_group(). So senders of
multicast messages don't have to iterate on outdated sock_set when
they are looking for an available set of sockets.

In af_unix.c, lockdep annotations (a09785a2):
/*
 * AF_UNIX sockets do not interact with hardware, hence they
 * dont trigger interrupts - so it's safe for them to have
 * bh-unsafe locking for their sk_receive_queue.lock. Split off
 * this special lock-class by reinitializing the spinlock key:
 */
static struct lock_class_key af_unix_sk_receive_queue_lock_key;

       lockdep_set_class(&sk->sk_receive_queue.lock,
                               &af_unix_sk_receive_queue_lock_key);


I don't know if I should avoid releasing sockets in RCU callbacks or
update the lockdep annotations.

-- 
Alban

^ permalink raw reply

* Typo in ip man page
From: Jan Wulfes @ 2011-01-21 17:23 UTC (permalink / raw)
  To: netdev

I fixed a typo in the ip(8) man page.

Please find the output of git request-pull below:

-- 

The following changes since commit 9351fec72d2bb4e7501c12949855ab252b037bce:

  Update to lasest kernel headers (2011-01-12 18:46:54 -0800)

are available in the git repository at:
  https://klobs@github.com/klobs/iproute2.git master

Jan Wulfes (1):
      Fixed typo

 man/man8/ip.8 |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)


^ permalink raw reply

* Re: [PATCH v2] e1000e: convert to stats64
From: Flavio Leitner @ 2011-01-21 17:03 UTC (permalink / raw)
  To: Jeff Kirsher; +Cc: e1000-devel, netdev
In-Reply-To: <AANLkTi=y4u1DTtOX9-70CeDGhWGu4bBKK0Y32Js41=bK@mail.gmail.com>

On Thu, Dec 16, 2010 at 07:14:35PM -0800, Jeff Kirsher wrote:
> On Thu, Dec 16, 2010 at 04:31, Flavio Leitner <fleitner@redhat.com> wrote:
> > On Tue, Dec 14, 2010 at 10:29:33PM +0100, Eric Dumazet wrote:
> >> Le mardi 14 décembre 2010 à 18:32 -0200, Flavio Leitner a écrit :
> >> > Provides accurate stats at the time user reads them.
> >> >
> >> > Signed-off-by: Flavio Leitner <fleitner@redhat.com>
> >> > ---
> >> >  drivers/net/e1000e/e1000.h   |    5 ++-
> >> >  drivers/net/e1000e/ethtool.c |   27 +++++++++-------
> >> >  drivers/net/e1000e/netdev.c  |   68 ++++++++++++++++++++++++-----------------
> >> >  3 files changed, 59 insertions(+), 41 deletions(-)
> >> >
> >> > diff --git a/drivers/net/e1000e/e1000.h b/drivers/net/e1000e/e1000.h
> >> > index fdc67fe..5a5e944 100644
> >> > --- a/drivers/net/e1000e/e1000.h
> >> > +++ b/drivers/net/e1000e/e1000.h
> >> > @@ -363,6 +363,8 @@ struct e1000_adapter {
> >> >     /* structs defined in e1000_hw.h */
> >> >     struct e1000_hw hw;
> >> >
> >> > +   spinlock_t stats64_lock;
> >> > +   struct rtnl_link_stats64 stats64;
> >>
> >> I am not sure why you add this stats64 in e1000_adapter ?
> >>
> >> Why isnt it provided by callers (automatic variable, or provided to
> >> ndo_get_stats64()). I dont see accumulators, only a full rewrite of this
> >> structure in e1000e_update_stats() ?
> >
> > Good point. I have modified the patch to fix that.
> > thanks!
> >
> > From 3487bd7dacd0c23bba315270139dab6e00e5ff02 Mon Sep 17 00:00:00 2001
> > From: Flavio Leitner <fleitner@redhat.com>
> > Date: Thu, 16 Dec 2010 10:26:03 -0200
> > Subject: [PATCH] e1000e: convert to stats64
> >
> > Provides accurate stats at the time user reads them.
> >
> > Signed-off-by: Flavio Leitner <fleitner@redhat.com>
> > ---
> >  drivers/net/e1000e/e1000.h   |    3 ++
> >  drivers/net/e1000e/ethtool.c |   25 ++++++++-------
> >  drivers/net/e1000e/netdev.c  |   68 +++++++++++++++++++++++++++++++++--------
> >  3 files changed, 70 insertions(+), 26 deletions(-)
> >
> 
> I have dropped you previous version of the patch and applied v2 to my
> tree for review and testing.
> Thanks Flavio!

Hi Jeff,

Do you have any feedback on this patch?
thanks,
-- 
Flavio

------------------------------------------------------------------------------
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires 
February 28th, so secure your free ArcSight Logger TODAY! 
http://p.sf.net/sfu/arcsight-sfd2d
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply

* Re: [PATCH] netfilter: ipvs: fix compiler warnings
From: Patrick McHardy @ 2011-01-21 16:50 UTC (permalink / raw)
  To: Changli Gao
  Cc: Simon Horman, Wensong Zhang, Julian Anastasov, David S. Miller,
	netdev, lvs-devel, netfilter-devel
In-Reply-To: <1295604133-6869-1-git-send-email-xiaosuo@gmail.com>

Am 21.01.2011 11:02, schrieb Changli Gao:
> Fix compiler warnings when no transport protocol load balancing support
> is configured.

Thanks Changli, I'll apply your patch once one of the IPVS developers
ACKs this.

^ permalink raw reply

* Re: [PATCH] e1000: add support for Marvell Alaska M88E1118R PHY
From: Florian Fainelli @ 2011-01-21 16:27 UTC (permalink / raw)
  To: Dirk Brandewie; +Cc: Jeff Kirsher, netdev@vger.kernel.org, David Miller
In-Reply-To: <1295549359.7387.30.camel@localhost.localdomain>

Hello Dirk, Jeff,

On Thursday 20 January 2011 19:49:19 Dirk Brandewie wrote:
> On Wed, 2011-01-19 at 22:51 -0800, Jeff Kirsher wrote:
> > On Wed, Jan 19, 2011 at 01:09, Florian Fainelli <ffainelli@freebox.fr> 
wrote:
> > > From: Florian Fainelli <ffainelli@freebox.fr>
> > > 
> > > This patch adds support for Marvell Alask M88E188R PHY chips. Support
> > > for other M88* PHYs is already there, so there is nothing more to add
> > > than its PHY id.
> > > 
> > > Signed-off-by: Florian Fainelli <ffainelli@freebox.fr>
> > > CC: Dirk Brandewie <dirk.j.brandewie@intel.com>
> > > CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> > > ---
> > 
> > The patch itself looks fine.  I am concerned about validation.
> > 
> > Dirk - is there a chance that the ce4100 will use this PHY?  If so,
> > can you cover the validation?
> 
> Florian is working on a CE4100 based platform.  It looks like they used
> a different PHY from the Flacon Falls reference platform. I can't
> directly test this patch since I don't have their hardware.  I will be
> testing .38-rc1 next week on falcon falls.

Indeed, we use this PHY on our hardware.

> 
> I think the best we can do without the hardware is to compare the data
> sheet for the new PHY with the PHYs already supported and make sure they
> are compatible.  If the datasheets match up for the features the driver
> is using this seems pretty low risk IMHO.

As far as I could check, all M88E111* should behave the same for the setup 
done in e1000.
--
Florian

^ permalink raw reply

* [PATCH 2/8] af_unix: Add constant for unix socket options level
From: Alban Crequy @ 2011-01-21 14:39 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Lennart Poettering, netdev,
	linux-doc, linux-
  Cc: Alban Crequy
In-Reply-To: <20110121143751.57b1453d@chocolatine.cbg.collabora.co.uk>

Assign the next free socket options level to be used by the unix
protocol and address family.

Signed-off-by: Alban Crequy <alban.crequy@collabora.co.uk>
Reviewed-by: Ian Molton <ian.molton@collabora.co.uk>
---
 include/linux/socket.h |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/include/linux/socket.h b/include/linux/socket.h
index edbb1d0..a257d1c 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -308,6 +308,7 @@ struct ucred {
 #define SOL_IUCV	277
 #define SOL_CAIF	278
 #define SOL_ALG		279
+#define SOL_UNIX	280
 
 /* IPX options */
 #define IPX_TYPE	1
-- 
1.7.2.3


^ permalink raw reply related

* [PATCH 8/8] af_unix: Unsubscribe sockets from their multicast groups on RCV_SHUTDOWN
From: Alban Crequy @ 2011-01-21 14:39 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Lennart Poettering, netdev,
	linux-doc, linux-
  Cc: Alban Crequy
In-Reply-To: <20110121143751.57b1453d@chocolatine.cbg.collabora.co.uk>

Signed-off-by: Alban Crequy <alban.crequy@collabora.co.uk>
Reviewed-by: Ian Molton <ian.molton@collabora.co.uk>
---
 net/unix/af_unix.c |   35 +++++++++++++++++++++++++++++++++++
 1 files changed, 35 insertions(+), 0 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 138d9a2..9b281cf 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -2820,6 +2820,10 @@ static int unix_shutdown(struct socket *sock, int mode)
 {
 	struct sock *sk = sock->sk;
 	struct sock *other;
+#ifdef CONFIG_UNIX_MULTICAST
+	struct unix_sock *u = unix_sk(sk);
+	int unsubscribed = 0;
+#endif
 
 	mode = (mode+1)&(RCV_SHUTDOWN|SEND_SHUTDOWN);
 
@@ -2831,7 +2835,38 @@ static int unix_shutdown(struct socket *sock, int mode)
 	other = unix_peer(sk);
 	if (other)
 		sock_hold(other);
+
+#ifdef CONFIG_UNIX_MULTICAST
+	/* If the socket subscribed to a multicast group and it is shutdown
+	 * with (mode&RCV_SHUTDOWN), it should be unsubscribed or at least
+	 * stop blocking the peers */
+	if (mode&RCV_SHUTDOWN) {
+		struct unix_mcast *node;
+		struct hlist_node *pos;
+		struct hlist_node *pos_tmp;
+
+		spin_lock(&unix_multicast_lock);
+		hlist_for_each_entry_safe(node, pos, pos_tmp,
+					  &u->mcast_subscriptions,
+					  subscription_node) {
+			hlist_del_rcu(&node->member_node);
+			hlist_del_rcu(&node->subscription_node);
+			atomic_dec(&node->group->mcast_members_cnt);
+			atomic_inc(&node->group->mcast_membership_generation);
+			hlist_add_head_rcu(&node->member_dead_node,
+					   &node->group->mcast_dead_members);
+			unsubscribed = 1;
+		}
+		spin_unlock(&unix_multicast_lock);
+	}
+#endif
 	unix_state_unlock(sk);
+
+#ifdef CONFIG_UNIX_MULTICAST
+	if (unsubscribed)
+		wake_up_interruptible_all(&u->peer_wait);
+#endif
+
 	sk->sk_state_change(sk);
 
 	if (other &&
-- 
1.7.2.3

^ permalink raw reply related

* [PATCH 7/8] af_unix: implement poll(POLLOUT) for multicast sockets
From: Alban Crequy @ 2011-01-21 14:39 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Lennart Poettering, netdev,
	linux-doc, linux-
  Cc: Alban Crequy
In-Reply-To: <20110121143751.57b1453d@chocolatine.cbg.collabora.co.uk>

When a socket subscribed to a multicast group has its incoming queue full, it
can either block the emission to the multicast group or let the messages be
dropped. The latter is useful to monitor all messages without slowing down the
traffic.

It is specified with the flag UNIX_MREQ_DROP_WHEN_FULL when the multicast group
is joined.

poll(POLLOUT) is implemented by checking all receiving queues of subscribed
sockets. If only one of them has its receiving queue full and does not have
UNIX_MREQ_DROP_WHEN_FULL, the multicast socket is not writeable.

Signed-off-by: Alban Crequy <alban.crequy@collabora.co.uk>
Reviewed-by: Ian Molton <ian.molton@collabora.co.uk>
---
 net/unix/af_unix.c |   33 +++++++++++++++++++++++++++++++++
 1 files changed, 33 insertions(+), 0 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 4147d64..138d9a2 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -2940,6 +2940,11 @@ static unsigned int unix_dgram_poll(struct file *file, struct socket *sock,
 {
 	struct sock *sk = sock->sk, *other;
 	unsigned int mask, writable;
+#ifdef CONFIG_UNIX_MULTICAST
+	struct sock_set *others;
+	int err = 0;
+	int i;
+#endif
 
 	sock_poll_wait(file, sk_sleep(sk), wait);
 	mask = 0;
@@ -2980,6 +2985,34 @@ static unsigned int unix_dgram_poll(struct file *file, struct socket *sock,
 		sock_put(other);
 	}
 
+#ifdef CONFIG_UNIX_MULTICAST
+	/*
+	 * On multicast sockets, we need to check if the receiving queue is
+	 * full on all peers who don't have UNIX_MREQ_DROP_WHEN_FULL.
+	 */
+	if (!other || !unix_sk(other)->mcast_group)
+		goto skip_multicast;
+	others = unix_find_multicast_recipients(sk,
+		unix_sk(other)->mcast_group, &err);
+	if (!others)
+		goto skip_multicast;
+	for (i = others->offset ; i < others->cnt ; i++) {
+		if (others->items[i].flags & UNIX_MREQ_DROP_WHEN_FULL)
+			continue;
+		if (unix_peer(others->items[i].s) != sk) {
+			sock_poll_wait(file,
+				&unix_sk(others->items[i].s)->peer_wait, wait);
+			if (unix_recvq_full(others->items[i].s)) {
+				writable = 0;
+				break;
+			}
+		}
+	}
+	up_sock_set(others);
+
+skip_multicast:
+#endif
+
 	if (writable)
 		mask |= POLLOUT | POLLWRNORM | POLLWRBAND;
 	else
-- 
1.7.2.3

^ permalink raw reply related

* [PATCH 6/8] af_unix: Deliver message to several recipients in case of multicast
From: Alban Crequy @ 2011-01-21 14:39 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Lennart Poettering, netdev,
	linux-doc, linux-
  Cc: Alban Crequy, Ian Molton
In-Reply-To: <20110121143751.57b1453d@chocolatine.cbg.collabora.co.uk>

unix_dgram_sendmsg() implements the delivery both for SOCK_DGRAM and
SOCK_SEQPACKET unix sockets.

The delivery is done in an atomic way; either the message is delivered to all
recipients or none, even in case of interruptions or errors.

Signed-off-by: Alban Crequy <alban.crequy@collabora.co.uk>
Signed-off-by: Ian Molton <ian.molton@collabora.co.uk>
---
 net/unix/af_unix.c |  242 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 242 insertions(+), 0 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index fe0d3bb..4147d64 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1715,6 +1715,210 @@ static int unix_scm_to_skb(struct scm_cookie *scm, struct sk_buff *skb, bool sen
 	return err;
 }
 
+#ifdef CONFIG_UNIX_MULTICAST
+static void kfree_skb_sock_set(struct sock_set *set)
+{
+	int i;
+	for (i = set->offset ; i < set->cnt ; i++) {
+		if (set->items[i].skb) {
+			kfree_skb(set->items[i].skb);
+			set->items[i].skb = NULL;
+		}
+	}
+}
+
+static void unix_mcast_lock(struct unix_mcast_group *group,
+			    struct sock_set *set)
+{
+	int i;
+	for (i = 0 ; i < MCAST_LOCK_CLASS_COUNT ; i++) {
+		if (set->hash & (1 << i))
+			spin_lock_nested(&group->lock[i], i);
+	}
+}
+
+static void unix_mcast_unlock(struct unix_mcast_group *group,
+			      struct sock_set *set)
+{
+	int i;
+	for (i = MCAST_LOCK_CLASS_COUNT - 1 ; i >= 0 ; i--) {
+		if (set->hash & (1 << i))
+			spin_unlock(&group->lock[i]);
+	}
+}
+
+
+static int unix_dgram_sendmsg_multicast(struct sock_iocb *siocb,
+					struct sock *sk,
+					struct sk_buff *skb,
+					struct unix_mcast_group *group,
+					struct sock_set *others_set,
+					size_t len,
+					int max_level,
+					long timeo)
+{
+	int err;
+	int i;
+
+	BUG_ON(!others_set);
+
+restart:
+	for (i = others_set->offset ; i < others_set->cnt ; i++) {
+		struct sock *cur = others_set->items[i].s;
+		unsigned int pkt_len;
+		struct sk_filter *filter;
+
+		if (!others_set->items[i].to_deliver)
+			continue;
+
+		BUG_ON(others_set->items[i].skb);
+		BUG_ON(cur == NULL);
+
+		rcu_read_lock();
+		filter = rcu_dereference(cur->sk_filter);
+		if (filter)
+			pkt_len = sk_run_filter(skb, filter->insns);
+		else
+			pkt_len = 0xffffffff;
+		rcu_read_unlock();
+
+		if (pkt_len == 0) {
+			others_set->items[i].to_deliver = 0;
+			continue;
+		}
+
+		others_set->items[i].skb = skb_clone(skb, GFP_KERNEL);
+		if (!others_set->items[i].skb) {
+			kfree_skb_sock_set(others_set);
+			err = -ENOMEM;
+			goto out_free;
+		}
+		skb_set_owner_w(others_set->items[i].skb, sk);
+		err = unix_scm_to_skb(siocb->scm, others_set->items[i].skb,
+				      true);
+		if (err < 0)
+			goto out_free;
+		unix_get_secdata(siocb->scm, others_set->items[i].skb);
+		pskb_trim(others_set->items[i].skb, pkt_len);
+	}
+
+	for (i = others_set->offset ; i < others_set->cnt ; i++) {
+		struct sock *cur = others_set->items[i].s;
+
+		if (!others_set->items[i].to_deliver)
+			continue;
+
+		unix_state_lock(cur);
+
+		if (cur->sk_shutdown & RCV_SHUTDOWN) {
+			unix_state_unlock(cur);
+			kfree_skb(others_set->items[i].skb);
+			others_set->items[i].skb = NULL;
+				others_set->items[i].to_deliver = 0;
+				continue;
+		}
+
+		if (sk->sk_type != SOCK_SEQPACKET) {
+			err = security_unix_may_send(sk->sk_socket,
+						     cur->sk_socket);
+			if (err) {
+				unix_state_unlock(cur);
+				kfree_skb(others_set->items[i].skb);
+				others_set->items[i].skb = NULL;
+					others_set->items[i].to_deliver = 0;
+					continue;
+			}
+		}
+
+		if (unix_peer(cur) != sk && unix_recvq_full(cur)) {
+			kfree_skb(others_set->items[i].skb);
+			others_set->items[i].skb = NULL;
+
+			if (others_set->items[i].flags
+					& UNIX_MREQ_DROP_WHEN_FULL) {
+				/* Drop the skbs and continue */
+				unix_state_unlock(cur);
+				others_set->items[i].to_deliver = 0;
+				continue;
+			} else {
+				if (!timeo) {
+					unix_state_unlock(cur);
+					err = -EAGAIN;
+					goto out_free;
+				}
+
+				timeo = unix_wait_for_peer(cur, timeo);
+
+				err = sock_intr_errno(timeo);
+				if (signal_pending(current))
+					goto out_free;
+
+				kfree_skb_sock_set(others_set);
+				goto restart;
+			}
+		}
+		unix_state_unlock(cur);
+	}
+
+	unix_mcast_lock(group, others_set);
+	for (i = others_set->offset ; i < others_set->cnt ; i++) {
+		struct sock *cur = others_set->items[i].s;
+
+		if (!others_set->items[i].to_deliver)
+			continue;
+
+		BUG_ON(cur == NULL);
+		BUG_ON(others_set->items[i].skb == NULL);
+
+		unix_state_lock(cur);
+
+		if (sock_flag(cur, SOCK_DEAD)) {
+			unix_state_unlock(cur);
+
+			kfree_skb(others_set->items[i].skb);
+			others_set->items[i].skb = NULL;
+			others_set->items[i].to_deliver = 0;
+			continue;
+		}
+
+		if (sock_flag(cur, SOCK_RCVTSTAMP))
+			__net_timestamp(others_set->items[i].skb);
+
+		skb_queue_tail(&cur->sk_receive_queue,
+			       others_set->items[i].skb);
+		others_set->items[i].skb = NULL;
+		if (max_level > unix_sk(cur)->recursion_level)
+			unix_sk(cur)->recursion_level = max_level;
+
+		unix_state_unlock(cur);
+	}
+	unix_mcast_unlock(group, others_set);
+
+	for (i = others_set->offset ; i < others_set->cnt ; i++) {
+		struct sock *cur = others_set->items[i].s;
+
+		if (!others_set->items[i].to_deliver)
+			continue;
+
+		cur->sk_data_ready(cur, len);
+	}
+
+	kfree_skb(skb);
+	scm_destroy(siocb->scm);
+	up_sock_set(others_set);
+	return len;
+
+out_free:
+	kfree_skb(skb);
+	if (others_set) {
+		kfree_skb_sock_set(others_set);
+		up_sock_set(others_set);
+	}
+	return err;
+}
+#endif
+
+
 /*
  *	Send AF_UNIX data.
  */
@@ -1735,6 +1939,10 @@ static int unix_dgram_sendmsg(struct kiocb *kiocb, struct socket *sock,
 	long timeo;
 	struct scm_cookie tmp_scm;
 	int max_level;
+#ifdef CONFIG_UNIX_MULTICAST
+	struct unix_mcast_group *group = NULL;
+	struct sock_set *others_set = NULL;
+#endif
 
 	if (NULL == siocb->scm)
 		siocb->scm = &tmp_scm;
@@ -1756,8 +1964,20 @@ static int unix_dgram_sendmsg(struct kiocb *kiocb, struct socket *sock,
 		sunaddr = NULL;
 		err = -ENOTCONN;
 		other = unix_peer_get(sk);
+
 		if (!other)
 			goto out;
+
+#ifdef CONFIG_UNIX_MULTICAST
+		group = unix_sk(other)->mcast_group;
+		if (group) {
+			others_set = unix_find_multicast_recipients(sk,
+				group, &err);
+
+			if (!others_set)
+				goto out;
+		}
+#endif
 	}
 
 	if (test_bit(SOCK_PASSCRED, &sock->flags) && !u->addr
@@ -1795,6 +2015,28 @@ restart:
 					hash, &err);
 		if (other == NULL)
 			goto out_free;
+
+#ifdef CONFIG_UNIX_MULTICAST
+		group = unix_sk(other)->mcast_group;
+		if (group) {
+			others_set = unix_find_multicast_recipients(sk,
+				group, &err);
+
+			sock_put(other);
+			other = NULL;
+
+			if (!others_set)
+				goto out;
+		}
+	}
+
+	if (group) {
+		err = unix_dgram_sendmsg_multicast(siocb, sk, skb, group,
+			others_set, len, max_level, timeo);
+		if (err < 0)
+			goto out;
+		return err;
+#endif
 	}
 
 	if (sk_filter(other, skb) < 0) {
-- 
1.7.2.3

^ permalink raw reply related

* [PATCH 5/8] af_unix: find the recipients of a multicast group
From: Alban Crequy @ 2011-01-21 14:39 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Lennart Poettering, netdev,
	linux-doc, linux-
  Cc: Alban Crequy
In-Reply-To: <20110121143751.57b1453d@chocolatine.cbg.collabora.co.uk>

unix_find_multicast_recipients() returns a list of recipients for the specific
multicast address. It checks the options UNIX_MREQ_SEND_TO_PEER and
UNIX_MREQ_LOOPBACK to get the right recipients.

The list of recipients is ordered and guaranteed not to have duplicates.

When the caller has finished with the list of recipients, it will call
up_sock_set() and the list can be reused by another sender.

Signed-off-by: Alban Crequy <alban.crequy@collabora.co.uk>
Reviewed-by: Ian Molton <ian.molton@collabora.co.uk>
---
 net/unix/af_unix.c |  259 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 256 insertions(+), 3 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index f25c020..fe0d3bb 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -114,18 +114,84 @@
 #include <linux/mount.h>
 #include <net/checksum.h>
 #include <linux/security.h>
-
-static struct hlist_head unix_socket_table[UNIX_HASH_SIZE + 1];
-static DEFINE_SPINLOCK(unix_table_lock);
 #ifdef CONFIG_UNIX_MULTICAST
+#include <linux/sort.h>
+
 static DEFINE_SPINLOCK(unix_multicast_lock);
 #endif
+static struct hlist_head unix_socket_table[UNIX_HASH_SIZE + 1];
+static DEFINE_SPINLOCK(unix_table_lock);
 static atomic_long_t unix_nr_socks;
 
 #define unix_sockets_unbound	(&unix_socket_table[UNIX_HASH_SIZE])
 
 #define UNIX_ABSTRACT(sk)	(unix_sk(sk)->addr->hash != UNIX_HASH_SIZE)
 
+#ifdef CONFIG_UNIX_MULTICAST
+/* Array of sockets used in multicast deliveries */
+struct sock_item {
+	/* constant fields */
+	struct sock *s;
+	unsigned int flags;
+
+	/* fields reinitialized at every send */
+	struct sk_buff *skb;
+	unsigned int to_deliver:1;
+};
+
+struct sock_set {
+	/* struct sock_set is used by one sender at a time */
+	struct semaphore sem;
+	struct hlist_node list;
+	struct rcu_head rcu;
+	int generation;
+
+	/* the sender should consider only sockets from items[offset] to
+	 * item[cnt-1] */
+	int cnt;
+	int offset;
+	/* Bitfield of (struct unix_mcast_group)->lock spinlocks to take in
+	 * order to guarantee causal order of delivery */
+	u8 hash;
+	/* ordered list of sockets without duplicates. Cell zero is reserved
+	 * for sending a message to the accepted socket (SOCK_SEQPACKET only).
+	 */
+	struct sock_item items[0];
+};
+
+static void up_sock_set(struct sock_set *set)
+{
+	if ((set->offset == 0) && set->items[0].s) {
+		sock_put(set->items[0].s);
+		set->items[0].s = NULL;
+		set->items[0].skb = NULL;
+	}
+	up(&set->sem);
+}
+
+static void kfree_sock_set(struct sock_set *set)
+{
+	int i;
+	for (i = set->offset ; i < set->cnt ; i++) {
+		if (set->items[i].s)
+			sock_put(set->items[i].s);
+	}
+	kfree(set);
+}
+
+static int sock_item_compare(const void *_a, const void *_b)
+{
+	const struct sock_item *a = _a;
+	const struct sock_item *b = _b;
+	if (a->s > b->s)
+		return 1;
+	else if (a->s < b->s)
+		return -1;
+	else
+		return 0;
+}
+#endif
+
 #ifdef CONFIG_SECURITY_NETWORK
 static void unix_get_secdata(struct scm_cookie *scm, struct sk_buff *skb)
 {
@@ -379,6 +445,7 @@ static void
 destroy_mcast_group(struct unix_mcast_group *group)
 {
 	struct unix_mcast *node;
+	struct sock_set *set;
 	struct hlist_node *pos;
 	struct hlist_node *pos_tmp;
 
@@ -392,6 +459,12 @@ destroy_mcast_group(struct unix_mcast_group *group)
 		sock_put(&node->member->sk);
 		kfree(node);
 	}
+	hlist_for_each_entry_safe(set, pos, pos_tmp,
+				  &group->mcast_members_lists,
+				  list) {
+		hlist_del_rcu(&set->list);
+		kfree_sock_set(set);
+	}
 	kfree(group);
 }
 #endif
@@ -851,6 +924,186 @@ fail:
 	return NULL;
 }
 
+#ifdef CONFIG_UNIX_MULTICAST
+static int unix_find_multicast_members(struct sock_set *set,
+				       int recipient_cnt,
+				       struct hlist_head *list)
+{
+	struct unix_mcast *node;
+	struct hlist_node *pos;
+
+	hlist_for_each_entry_rcu(node, pos, list,
+			     member_node) {
+		struct sock *s;
+
+		if (set->cnt + 1 > recipient_cnt)
+			return -ENOMEM;
+
+		s = &node->member->sk;
+		sock_hold(s);
+		set->items[set->cnt].s = s;
+		set->items[set->cnt].flags = node->flags;
+		set->cnt++;
+
+		set->hash |= 1 << ((((int)s) >> 6) & 0x07);
+	}
+
+	return 0;
+}
+
+void sock_set_reclaim(struct rcu_head *rp)
+{
+	struct sock_set *set = container_of(rp, struct sock_set, rcu);
+	kfree_sock_set(set);
+}
+
+static struct sock_set *unix_find_multicast_recipients(struct sock *sender,
+				struct unix_mcast_group *group,
+				int *err)
+{
+	struct sock_set *set = NULL; /* fake GCC */
+	struct sock_set *del_set;
+	struct hlist_node *pos;
+	int recipient_cnt;
+	int generation;
+	int i;
+
+	BUG_ON(sender == NULL);
+	BUG_ON(group == NULL);
+
+	/* Find an available set if any */
+	generation = atomic_read(&group->mcast_membership_generation);
+	rcu_read_lock();
+	hlist_for_each_entry_rcu(set, pos, &group->mcast_members_lists,
+			     list) {
+		if (down_trylock(&set->sem)) {
+			/* the set is being used by someone else */
+			continue;
+		}
+		if (set->generation == generation) {
+			/* the set is still valid, use it */
+			break;
+		}
+		/* The set is outdated. It will be removed from the RCU list
+		 * soon but not in this lockless RCU read */
+		up(&set->sem);
+	}
+	rcu_read_unlock();
+	if (pos)
+		goto list_found;
+
+	/* We cannot allocate in the spin lock. First, count the recipients */
+try_again:
+	generation = atomic_read(&group->mcast_membership_generation);
+	recipient_cnt = atomic_read(&group->mcast_members_cnt);
+
+	/* Allocate for the set and hope the number of recipients does not
+	 * change while the lock is released. If it changes, we have to try
+	 * again... We allocate a bit more than needed, so if a _few_ members
+	 * are added in a multicast group meanwhile, we don't always need to
+	 * try again. */
+	recipient_cnt += 5;
+
+	set = kmalloc(sizeof(struct sock_set)
+		      + sizeof(struct sock_item) * recipient_cnt,
+	    GFP_KERNEL);
+	if (!set) {
+		*err = -ENOMEM;
+		return NULL;
+	}
+	sema_init(&set->sem, 0);
+	set->cnt = 1;
+	set->offset = 1;
+	set->generation = generation;
+	set->hash = 0;
+
+	rcu_read_lock();
+	if (unix_find_multicast_members(set, recipient_cnt,
+			&group->mcast_members)) {
+		rcu_read_unlock();
+		kfree_sock_set(set);
+		goto try_again;
+	}
+	rcu_read_unlock();
+
+	/* Keep the array ordered to prevent deadlocks when locking the
+	 * receiving queues. The ordering is:
+	 * - First, the accepted socket (SOCK_SEQPACKET only)
+	 * - Then, the member sockets ordered by memory address
+	 * The accepted socket cannot be member of a multicast group.
+	 */
+	sort(set->items + 1, set->cnt - 1, sizeof(struct sock_item),
+	     sock_item_compare, NULL);
+	/* Avoid duplicates */
+	for (i = 2 ; i < set->cnt ; i++) {
+		if (set->items[i].s == set->items[i - 1].s) {
+			sock_put(set->items[i - 1].s);
+			set->items[i - 1].s = NULL;
+		}
+	}
+
+	if (generation != atomic_read(&group->mcast_membership_generation)) {
+		kfree_sock_set(set);
+		goto try_again;
+	}
+
+	/* Take the lock to insert the new list but take the opportunity to do
+	 * some garbage collection on outdated lists */
+	spin_lock(&unix_multicast_lock);
+	hlist_for_each_entry_rcu(del_set, pos, &group->mcast_members_lists,
+			     list) {
+		if (down_trylock(&del_set->sem)) {
+			/* the list is being used by someone else */
+			continue;
+		}
+		if (del_set->generation < generation) {
+			hlist_del_rcu(&del_set->list);
+			call_rcu(&del_set->rcu, sock_set_reclaim);
+		}
+		up(&del_set->sem);
+	}
+	hlist_add_head_rcu(&set->list,
+			   &group->mcast_members_lists);
+	spin_unlock(&unix_multicast_lock);
+
+list_found:
+	/* List found. Initialize the first item. */
+	if (sender->sk_type == SOCK_SEQPACKET
+	    && unix_peer(sender)
+	    && unix_sk(sender)->mcast_send_to_peer) {
+		set->offset = 0;
+		sock_hold(unix_peer(sender));
+		set->items[0].s = unix_peer(sender);
+		set->items[0].skb = NULL;
+		set->items[0].to_deliver = 1;
+		set->items[0].flags =
+			unix_sk(sender)->mcast_drop_when_peer_full
+			? UNIX_MREQ_DROP_WHEN_FULL : 0;
+	} else {
+		set->items[0].s = NULL;
+		set->items[0].skb = NULL;
+		set->items[0].to_deliver = 0;
+		set->offset = 1;
+	}
+
+	/* Initialize the other items. */
+	for (i = 1 ; i < set->cnt ; i++) {
+		set->items[i].skb = NULL;
+		if (set->items[i].s == NULL) {
+			set->items[i].to_deliver = 0;
+			continue;
+		}
+		if (set->items[i].flags & UNIX_MREQ_LOOPBACK
+		    || sender != set->items[i].s)
+			set->items[i].to_deliver = 1;
+		else
+			set->items[i].to_deliver = 0;
+	}
+
+	return set;
+}
+#endif
+
 
 static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 {
-- 
1.7.2.3

^ permalink raw reply related

* [PATCH 4/8] af_unix: create, join and leave multicast groups with setsockopt
From: Alban Crequy @ 2011-01-21 14:39 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Lennart Poettering, netdev,
	linux-doc, linux-
  Cc: Alban Crequy, Ian Molton
In-Reply-To: <20110121143751.57b1453d@chocolatine.cbg.collabora.co.uk>

Multicast is implemented on SOCK_DGRAM and SOCK_SEQPACKET unix sockets.

An userspace application can create a multicast group with:
  struct unix_mreq mreq;
  mreq.address.sun_family = AF_UNIX;
  mreq.address.sun_path[0] = '\0';
  strcpy(mreq.address.sun_path + 1, "socket-address");
  mreq.flags = 0;

  sockfd = socket(AF_UNIX, SOCK_DGRAM, 0);
  ret = setsockopt(sockfd, SOL_UNIX, UNIX_CREATE_GROUP, &mreq, sizeof(mreq));

Then a multicast group can be joined and left with:
  ret = setsockopt(sockfd, SOL_UNIX, UNIX_JOIN_GROUP, &mreq, sizeof(mreq));
  ret = setsockopt(sockfd, SOL_UNIX, UNIX_LEAVE_GROUP, &mreq, sizeof(mreq));

A socket can be a member of several multicast group.

Signed-off-by: Alban Crequy <alban.crequy@collabora.co.uk>
Signed-off-by: Ian Molton <ian.molton@collabora.co.uk>
---
 include/net/af_unix.h |   77 +++++++++++
 net/unix/Kconfig      |   10 ++
 net/unix/af_unix.c    |  339 ++++++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 424 insertions(+), 2 deletions(-)

diff --git a/include/net/af_unix.h b/include/net/af_unix.h
index 18e5c3f..f2b605b 100644
--- a/include/net/af_unix.h
+++ b/include/net/af_unix.h
@@ -41,7 +41,62 @@ struct unix_skb_parms {
 				spin_lock_nested(&unix_sk(s)->lock, \
 				SINGLE_DEPTH_NESTING)
 
+/* UNIX socket options */
+#define UNIX_CREATE_GROUP	1
+#define UNIX_JOIN_GROUP		2
+#define UNIX_LEAVE_GROUP	3
+
+/* Flags on unix_mreq */
+
+/* On UNIX_JOIN_GROUP: the socket will receive its own messages */
+#define UNIX_MREQ_LOOPBACK		0x01
+
+/* ON UNIX_JOIN_GROUP: the messages will also be received by the peer */
+#define UNIX_MREQ_SEND_TO_PEER		0x02
+
+/* ON UNIX_JOIN_GROUP: just drop the message instead of blocking if the
+ * receiving queue is full */
+#define UNIX_MREQ_DROP_WHEN_FULL	0x04
+
+struct unix_mreq {
+	struct sockaddr_un	address;
+	unsigned int		flags;
+};
+
 #ifdef __KERNEL__
+
+struct unix_mcast_group {
+	/* RCU list of (struct unix_mcast)->member_node
+	 * Messages sent to the multicast group are delivered to this list of
+	 * members */
+	struct hlist_head	mcast_members;
+
+	/* RCU list of (struct unix_mcast)->member_dead_node
+	 * When the group dies, previous members' reference counters must be
+	 * decremented */
+	struct hlist_head	mcast_dead_members;
+
+	/* RCU list of (struct sock_set)->list */
+	struct hlist_head	mcast_members_lists;
+
+	atomic_t		mcast_members_cnt;
+
+	/* The generation is incremented each time a peer joins or
+	 * leaves the group. It is used to invalidate old lists
+	 * struct sock_set */
+	atomic_t		mcast_membership_generation;
+
+	/* Locks to guarantee causal order in deliveries */
+#define MCAST_LOCK_CLASS_COUNT	8
+	spinlock_t		lock[MCAST_LOCK_CLASS_COUNT];
+
+	/* The group is referenced by:
+	 * - the socket who created the multicast group
+	 * - the accepted sockets (SOCK_SEQPACKET only)
+	 * - the current members of the group */
+	atomic_t		refcnt;
+};
+
 /* The AF_UNIX socket */
 struct unix_sock {
 	/* WARNING: sk has to be the first member */
@@ -57,9 +112,31 @@ struct unix_sock {
 	spinlock_t		lock;
 	unsigned int		gc_candidate : 1;
 	unsigned int		gc_maybe_cycle : 1;
+	unsigned int		mcast_send_to_peer : 1;
+	unsigned int		mcast_drop_when_peer_full : 1;
 	unsigned char		recursion_level;
+	struct unix_mcast_group	*mcast_group;
+
+	/* RCU List of (struct unix_mcast)->subscription_node
+	 * A socket can subscribe to several multicast group
+	 */
+	struct hlist_head	mcast_subscriptions;
+
 	struct socket_wq	peer_wq;
 };
+
+struct unix_mcast {
+	struct unix_sock	*member;
+	struct unix_mcast_group	*group;
+	unsigned int		flags;
+	struct hlist_node	subscription_node;
+	/* A subscription cannot be both alive and dead but we cannot use the
+	 * same field because RCU readers run lockless. member_dead_node is
+	 * not read by lockless RCU readers. */
+	struct hlist_node	member_node;
+	struct hlist_node	member_dead_node;
+};
+
 #define unix_sk(__sk) ((struct unix_sock *)__sk)
 
 #define peer_wait peer_wq.wait
diff --git a/net/unix/Kconfig b/net/unix/Kconfig
index 5a69733..e3e5d9b 100644
--- a/net/unix/Kconfig
+++ b/net/unix/Kconfig
@@ -19,3 +19,13 @@ config UNIX
 
 	  Say Y unless you know what you are doing.
 
+config UNIX_MULTICAST
+	depends on UNIX && EXPERIMENTAL
+	bool "Multicast over Unix domain sockets"
+	---help---
+	  If you say Y here, you will include support for multicasting on Unix
+	  domain sockets. Support is available for SOCK_DGRAM and
+	  SOCK_SEQPACKET. Certain types of delivery synchronisation are
+	  provided, see Documentation/networking/multicast-unix-sockets.txt
+
+
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 7ea85de..f25c020 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -117,6 +117,9 @@
 
 static struct hlist_head unix_socket_table[UNIX_HASH_SIZE + 1];
 static DEFINE_SPINLOCK(unix_table_lock);
+#ifdef CONFIG_UNIX_MULTICAST
+static DEFINE_SPINLOCK(unix_multicast_lock);
+#endif
 static atomic_long_t unix_nr_socks;
 
 #define unix_sockets_unbound	(&unix_socket_table[UNIX_HASH_SIZE])
@@ -371,6 +374,28 @@ static void unix_sock_destructor(struct sock *sk)
 #endif
 }
 
+#ifdef CONFIG_UNIX_MULTICAST
+static void
+destroy_mcast_group(struct unix_mcast_group *group)
+{
+	struct unix_mcast *node;
+	struct hlist_node *pos;
+	struct hlist_node *pos_tmp;
+
+	BUG_ON(atomic_read(&group->refcnt) != 0);
+	BUG_ON(!hlist_empty(&group->mcast_members));
+
+	hlist_for_each_entry_safe(node, pos, pos_tmp,
+				  &group->mcast_dead_members,
+				  member_dead_node) {
+		hlist_del_rcu(&node->member_dead_node);
+		sock_put(&node->member->sk);
+		kfree(node);
+	}
+	kfree(group);
+}
+#endif
+
 static int unix_release_sock(struct sock *sk, int embrion)
 {
 	struct unix_sock *u = unix_sk(sk);
@@ -379,6 +404,11 @@ static int unix_release_sock(struct sock *sk, int embrion)
 	struct sock *skpair;
 	struct sk_buff *skb;
 	int state;
+#ifdef CONFIG_UNIX_MULTICAST
+	struct unix_mcast *node;
+	struct hlist_node *pos;
+	struct hlist_node *pos_tmp;
+#endif
 
 	unix_remove_socket(sk);
 
@@ -392,6 +422,23 @@ static int unix_release_sock(struct sock *sk, int embrion)
 	u->mnt	     = NULL;
 	state = sk->sk_state;
 	sk->sk_state = TCP_CLOSE;
+#ifdef CONFIG_UNIX_MULTICAST
+	spin_lock(&unix_multicast_lock);
+	hlist_for_each_entry_safe(node, pos, pos_tmp, &u->mcast_subscriptions,
+				  subscription_node) {
+		hlist_del_rcu(&node->member_node);
+		hlist_del_rcu(&node->subscription_node);
+		atomic_dec(&node->group->mcast_members_cnt);
+		atomic_inc(&node->group->mcast_membership_generation);
+		hlist_add_head_rcu(&node->member_dead_node,
+				   &node->group->mcast_dead_members);
+		if (atomic_dec_and_test(&node->group->refcnt))
+			destroy_mcast_group(node->group);
+	}
+	if (u->mcast_group && atomic_dec_and_test(&u->mcast_group->refcnt))
+		destroy_mcast_group(u->mcast_group);
+	spin_unlock(&unix_multicast_lock);
+#endif
 	unix_state_unlock(sk);
 
 	wake_up_interruptible_all(&u->peer_wait);
@@ -631,6 +678,9 @@ static struct sock *unix_create1(struct net *net, struct socket *sock)
 	atomic_long_set(&u->inflight, 0);
 	INIT_LIST_HEAD(&u->link);
 	mutex_init(&u->readlock); /* single task reading lock */
+#ifdef CONFIG_UNIX_MULTICAST
+	INIT_HLIST_HEAD(&u->mcast_subscriptions);
+#endif
 	init_waitqueue_head(&u->peer_wait);
 	unix_insert_socket(unix_sockets_unbound, sk);
 out:
@@ -1055,6 +1105,10 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr,
 	struct sock *newsk = NULL;
 	struct sock *other = NULL;
 	struct sk_buff *skb = NULL;
+#ifdef CONFIG_UNIX_MULTICAST
+	struct unix_mcast *node;
+	struct hlist_node *pos;
+#endif
 	unsigned hash;
 	int st;
 	int err;
@@ -1082,6 +1136,7 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr,
 	newsk = unix_create1(sock_net(sk), NULL);
 	if (newsk == NULL)
 		goto out;
+	newu = unix_sk(newsk);
 
 	/* Allocate skb for sending to listening sock */
 	skb = sock_wmalloc(newsk, 1, 0, GFP_KERNEL);
@@ -1094,6 +1149,8 @@ restart:
 	if (!other)
 		goto out;
 
+	otheru = unix_sk(other);
+
 	/* Latch state of peer */
 	unix_state_lock(other);
 
@@ -1165,6 +1222,18 @@ restart:
 		goto out_unlock;
 	}
 
+#ifdef CONFIG_UNIX_MULTICAST
+	/* Multicast sockets */
+	hlist_for_each_entry_rcu(node, pos, &u->mcast_subscriptions,
+				 subscription_node) {
+		if (node->group == otheru->mcast_group) {
+			atomic_inc(&otheru->mcast_group->refcnt);
+			newu->mcast_group = otheru->mcast_group;
+			break;
+		}
+	}
+#endif
+
 	/* The way is open! Fastly set all the necessary fields... */
 
 	sock_hold(sk);
@@ -1172,9 +1241,7 @@ restart:
 	newsk->sk_state		= TCP_ESTABLISHED;
 	newsk->sk_type		= sk->sk_type;
 	init_peercred(newsk);
-	newu = unix_sk(newsk);
 	newsk->sk_wq		= &newu->peer_wq;
-	otheru = unix_sk(other);
 
 	/* copy address information from listening to new sock*/
 	if (otheru->addr) {
@@ -1563,10 +1630,278 @@ out:
 }
 
 
+#ifdef CONFIG_UNIX_MULTICAST
+static int unix_mc_create(struct socket *sock, struct unix_mreq *mreq)
+{
+	struct sock *other;
+	int err;
+	unsigned hash;
+	int namelen;
+	struct unix_mcast_group *mcast_group;
+	int i;
+
+	if (mreq->address.sun_family != AF_UNIX ||
+	    mreq->address.sun_path[0] != '\0')
+		return -EINVAL;
+
+	err = unix_mkname(&mreq->address, sizeof(struct sockaddr_un), &hash);
+	if (err < 0)
+		return err;
+
+	namelen = err;
+	other = unix_find_other(sock_net(sock->sk), &mreq->address, namelen,
+				sock->type, hash, &err);
+	if (other) {
+		sock_put(other);
+		return -EADDRINUSE;
+	}
+
+	mcast_group = kmalloc(sizeof(struct unix_mcast_group), GFP_KERNEL);
+	if (!mcast_group)
+		return -ENOBUFS;
+
+	INIT_HLIST_HEAD(&mcast_group->mcast_members);
+	INIT_HLIST_HEAD(&mcast_group->mcast_dead_members);
+	INIT_HLIST_HEAD(&mcast_group->mcast_members_lists);
+	atomic_set(&mcast_group->mcast_members_cnt, 0);
+	atomic_set(&mcast_group->mcast_membership_generation, 1);
+	atomic_set(&mcast_group->refcnt, 1);
+	for (i = 0 ; i < MCAST_LOCK_CLASS_COUNT ; i++) {
+		spin_lock_init(&mcast_group->lock[i]);
+		lockdep_set_subclass(&mcast_group->lock[i], i);
+	}
+
+	err = sock->ops->bind(sock,
+		(struct sockaddr *)&mreq->address,
+		sizeof(struct sockaddr_un));
+	if (err < 0) {
+		kfree(mcast_group);
+		return err;
+	}
+
+	unix_state_lock(sock->sk);
+	unix_sk(sock->sk)->mcast_group = mcast_group;
+	unix_state_unlock(sock->sk);
+
+	return 0;
+}
+
+
+static int unix_mc_join(struct socket *sock, struct unix_mreq *mreq)
+{
+	struct unix_sock *u = unix_sk(sock->sk);
+	struct sock *other, *peer;
+	struct unix_mcast_group *group;
+	struct unix_mcast *node;
+	int err;
+	unsigned hash;
+	int namelen;
+
+	if (mreq->address.sun_family != AF_UNIX ||
+	    mreq->address.sun_path[0] != '\0')
+		return -EINVAL;
+
+	/* sockets which represent a group are not allowed to join another
+	 * group */
+	if (u->mcast_group)
+		return -EINVAL;
+
+	err = unix_autobind(sock);
+	if (err < 0)
+		return err;
+
+	err = unix_mkname(&mreq->address, sizeof(struct sockaddr_un), &hash);
+	if (err < 0)
+		return err;
+
+	namelen = err;
+	other = unix_find_other(sock_net(sock->sk), &mreq->address, namelen,
+				sock->type, hash, &err);
+	if (!other)
+		return -EINVAL;
+
+	group = unix_sk(other)->mcast_group;
+
+	if (!group) {
+		err = -EADDRINUSE;
+		goto sock_put_out;
+	}
+
+	node = kmalloc(sizeof(struct unix_mcast), GFP_KERNEL);
+	if (!node) {
+		err = -ENOMEM;
+		goto sock_put_out;
+	}
+	node->member = u;
+	node->group = group;
+	node->flags = mreq->flags;
+
+	if (sock->sk->sk_type == SOCK_SEQPACKET) {
+		peer = unix_peer_get(sock->sk);
+		if (peer) {
+			atomic_inc(&group->refcnt);
+			unix_sk(peer)->mcast_group = group;
+			sock_put(peer);
+		}
+	}
+
+	unix_state_lock(sock->sk);
+	unix_sk(sock->sk)->mcast_send_to_peer =
+		!!(mreq->flags & UNIX_MREQ_SEND_TO_PEER);
+	unix_sk(sock->sk)->mcast_drop_when_peer_full =
+		!!(mreq->flags & UNIX_MREQ_DROP_WHEN_FULL);
+	unix_state_unlock(sock->sk);
+
+	/* Keep a reference */
+	sock_hold(sock->sk);
+	atomic_inc(&group->refcnt);
+
+	spin_lock(&unix_multicast_lock);
+	hlist_add_head_rcu(&node->member_node,
+			   &group->mcast_members);
+	hlist_add_head_rcu(&node->subscription_node, &u->mcast_subscriptions);
+	atomic_inc(&group->mcast_members_cnt);
+	atomic_inc(&group->mcast_membership_generation);
+	spin_unlock(&unix_multicast_lock);
+
+	return 0;
+
+sock_put_out:
+	sock_put(other);
+	return err;
+}
+
+
+static int unix_mc_leave(struct socket *sock, struct unix_mreq *mreq)
+{
+	struct unix_sock *u = unix_sk(sock->sk);
+	struct sock *other;
+	struct unix_mcast_group *group;
+	struct unix_mcast *node;
+	struct hlist_node *pos;
+	int err;
+	unsigned hash;
+	int namelen;
+
+	if (mreq->address.sun_family != AF_UNIX ||
+	    mreq->address.sun_path[0] != '\0')
+		return -EINVAL;
+
+	err = unix_mkname(&mreq->address, sizeof(struct sockaddr_un), &hash);
+	if (err < 0)
+		return err;
+
+	namelen = err;
+	other = unix_find_other(sock_net(sock->sk), &mreq->address, namelen,
+				sock->type, hash, &err);
+	if (!other)
+		return -EINVAL;
+
+	group = unix_sk(other)->mcast_group;
+
+	if (!group) {
+		err = -EINVAL;
+		goto sock_put_out;
+	}
+
+	spin_lock(&unix_multicast_lock);
+
+	hlist_for_each_entry_rcu(node, pos, &u->mcast_subscriptions,
+			     subscription_node) {
+		if (node->group == group)
+			break;
+	}
+
+	if (!pos) {
+		spin_unlock(&unix_multicast_lock);
+		err = -EINVAL;
+		goto sock_put_out;
+	}
+
+	hlist_del_rcu(&node->member_node);
+	hlist_del_rcu(&node->subscription_node);
+	atomic_dec(&group->mcast_members_cnt);
+	atomic_inc(&group->mcast_membership_generation);
+	hlist_add_head_rcu(&node->member_dead_node,
+			   &group->mcast_dead_members);
+	spin_unlock(&unix_multicast_lock);
+
+	if (sock->sk->sk_type == SOCK_SEQPACKET) {
+		struct sock *peer = unix_peer_get(sock->sk);
+		if (peer) {
+			unix_sk(peer)->mcast_group = NULL;
+			atomic_dec(&group->refcnt);
+			sock_put(peer);
+		}
+	}
+
+	synchronize_rcu();
+
+	if (atomic_dec_and_test(&group->refcnt)) {
+		spin_lock(&unix_multicast_lock);
+		destroy_mcast_group(group);
+		spin_unlock(&unix_multicast_lock);
+	}
+
+	err = 0;
+
+	/* If the receiving queue of that socket was full, some writers on the
+	 * multicast group may be blocked */
+	wake_up_interruptible_sync_poll(&u->peer_wait,
+					POLLOUT | POLLWRNORM | POLLWRBAND);
+
+sock_put_out:
+	sock_put(other);
+	return err;
+}
+#endif
+
 static int unix_setsockopt(struct socket *sock, int level, int optname,
 			   char __user *optval, unsigned int optlen)
 {
+#ifdef CONFIG_UNIX_MULTICAST
+	struct unix_mreq mreq;
+	int err = 0;
+
+	if (level != SOL_UNIX)
+		return -ENOPROTOOPT;
+
+	switch (optname) {
+	case UNIX_CREATE_GROUP:
+	case UNIX_JOIN_GROUP:
+	case UNIX_LEAVE_GROUP:
+		if (optlen < sizeof(struct unix_mreq))
+			return -EINVAL;
+		if (copy_from_user(&mreq, optval, sizeof(struct unix_mreq)))
+			return -EFAULT;
+		break;
+
+	default:
+		break;
+	}
+
+	switch (optname) {
+	case UNIX_CREATE_GROUP:
+		err = unix_mc_create(sock, &mreq);
+		break;
+
+	case UNIX_JOIN_GROUP:
+		err = unix_mc_join(sock, &mreq);
+		break;
+
+	case UNIX_LEAVE_GROUP:
+		err = unix_mc_leave(sock, &mreq);
+		break;
+
+	default:
+		err = -ENOPROTOOPT;
+		break;
+	}
+
+	return err;
+#else
 	return -EOPNOTSUPP;
+#endif
 }
 
 
-- 
1.7.2.3

^ permalink raw reply related

* [PATCH 3/8] af_unix: add setsockopt on unix sockets
From: Alban Crequy @ 2011-01-21 14:39 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Lennart Poettering, netdev,
	linux-doc, linux-
  Cc: Alban Crequy
In-Reply-To: <20110121143751.57b1453d@chocolatine.cbg.collabora.co.uk>

unix_setsockopt() is called only on SOCK_DGRAM and SOCK_SEQPACKET unix sockets

Signed-off-by: Alban Crequy <alban.crequy@collabora.co.uk>
Reviewed-by: Ian Molton <ian.molton@collabora.co.uk>
---
 net/unix/af_unix.c |   13 +++++++++++--
 1 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index d8d98d5..7ea85de 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -512,6 +512,8 @@ static unsigned int unix_dgram_poll(struct file *, struct socket *,
 				    poll_table *);
 static int unix_ioctl(struct socket *, unsigned int, unsigned long);
 static int unix_shutdown(struct socket *, int);
+static int unix_setsockopt(struct socket *, int, int,
+			   char __user *, unsigned int);
 static int unix_stream_sendmsg(struct kiocb *, struct socket *,
 			       struct msghdr *, size_t);
 static int unix_stream_recvmsg(struct kiocb *, struct socket *,
@@ -559,7 +561,7 @@ static const struct proto_ops unix_dgram_ops = {
 	.ioctl =	unix_ioctl,
 	.listen =	sock_no_listen,
 	.shutdown =	unix_shutdown,
-	.setsockopt =	sock_no_setsockopt,
+	.setsockopt =	unix_setsockopt,
 	.getsockopt =	sock_no_getsockopt,
 	.sendmsg =	unix_dgram_sendmsg,
 	.recvmsg =	unix_dgram_recvmsg,
@@ -580,7 +582,7 @@ static const struct proto_ops unix_seqpacket_ops = {
 	.ioctl =	unix_ioctl,
 	.listen =	unix_listen,
 	.shutdown =	unix_shutdown,
-	.setsockopt =	sock_no_setsockopt,
+	.setsockopt =	unix_setsockopt,
 	.getsockopt =	sock_no_getsockopt,
 	.sendmsg =	unix_seqpacket_sendmsg,
 	.recvmsg =	unix_dgram_recvmsg,
@@ -1561,6 +1563,13 @@ out:
 }
 
 
+static int unix_setsockopt(struct socket *sock, int level, int optname,
+			   char __user *optval, unsigned int optlen)
+{
+	return -EOPNOTSUPP;
+}
+
+
 static int unix_stream_sendmsg(struct kiocb *kiocb, struct socket *sock,
 			       struct msghdr *msg, size_t len)
 {
-- 
1.7.2.3

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox