Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Christoph Lameter @ 2007-08-16 18:48 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Herbert Xu, Satyam Sharma, Paul E. McKenney, Stefan Richter,
	Chris Snook, Linux Kernel Mailing List, linux-arch,
	Linus Torvalds, netdev, Andrew Morton, ak, heiko.carstens, davem,
	schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday,
	jesper.juhl, segher
In-Reply-To: <18115.49946.522011.832468@cargo.ozlabs.ibm.com>

On Thu, 16 Aug 2007, Paul Mackerras wrote:

> 
> It seems that there could be a lot of places where atomic_t is used in
> a non-atomic fashion, and that those uses are either buggy, or there
> is some lock held at the time which guarantees that other CPUs aren't
> changing the value.  In both cases there is no point in using
> atomic_t; we might as well just use an ordinary int.

The point of atomic_t is to do atomic *changes* to the variable.

^ permalink raw reply

* Re: [patch 4/4] Update e1000 driver to use devres.
From: Kok, Auke @ 2007-08-16 17:36 UTC (permalink / raw)
  To: Tejun Heo; +Cc: e1000-devel, netdev, Brandon Philips
In-Reply-To: <46C43891.4080304@suse.de>

Tejun Heo wrote:
> Brandon Philips wrote:
>> -	mmio_start = pci_resource_start(pdev, BAR_0);
>>  	mmio_len = pci_resource_len(pdev, BAR_0);
> 
> You don't need mmio_len either.
> 
>> -	err = -EIO;
>> -	adapter->hw.hw_addr = ioremap(mmio_start, mmio_len);
>> +	adapter->hw.hw_addr = pcim_iomap(pdev, BAR_0, mmio_len);
> 
> Passing 0 as @max_len tells pci[m]_iomap() to use pci_resource_len() of
> the BAR.
> 
>> @@ -952,16 +948,15 @@ e1000_probe(struct pci_dev *pdev,
>>  	/* setup the private structure */
>>  
>>  	if ((err = e1000_sw_init(adapter)))
>> -		goto err_sw_init;
>> +		return err;
>>  
>>  	err = -EIO;
>>  	/* Flash BAR mapping must happen after e1000_sw_init
>>  	 * because it depends on mac_type */
>>  	if ((adapter->hw.mac_type == e1000_ich8lan) &&
>>  	   (pci_resource_flags(pdev, 1) & IORESOURCE_MEM)) {
>> -		flash_start = pci_resource_start(pdev, 1);
>>  		flash_len = pci_resource_len(pdev, 1);
> 
> Ditto.
> 
>> -		adapter->hw.flash_address = ioremap(flash_start, flash_len);
>> +		adapter->hw.flash_address = pcim_iomap(pdev, 1, flash_len);
>>  		if (!adapter->hw.flash_address)
>>  			goto err_flashmap;
>>  	}
> 


brandon,

seeing the multiple revisions I am scared that this will produce some fallout 
and e1000 already is quite fragile. I would suggest that you instead work 
against "e1000e" which is in a branch on jgarzik's netdev tree instead. This 
driver is new and it would be much more interesting to have devres used in here 
instead.

Since this driver is in -mm as well this would give your patches some testing 
before it goes upstream.

Let's leave e1000 alone for now if we can.

Auke

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/

^ permalink raw reply

* RE: [patch 4/4] Update e1000 driver to use devres.
From: Waskiewicz Jr, Peter P @ 2007-08-16 17:09 UTC (permalink / raw)
  To: Brandon Philips; +Cc: netdev, e1000-devel, teheo
In-Reply-To: <20070816170522.GA4226@ifup.org>

> Now if I insert a return -ENOMEM right after allocating tx_ring:
> --- a/drivers/net/e1000/e1000_main.c
> +++ b/drivers/net/e1000/e1000_main.c
> @@ -1356,6 +1356,8 @@ e1000_alloc_queues(struct e1000_adapter 
> *adapter)  {
>         adapter->tx_ring = kcalloc(adapter->num_tx_queues,
>                                    sizeof(struct 
> e1000_tx_ring), GFP_KERNEL);
> +
> +       return -ENOMEM;
>         if (!adapter->tx_ring)
>                 return -ENOMEM;
> 
> #insmod
> DEVRES ADD f7a80e80 pcim_release (8 bytes) DEVRES ADD 
> f7a80ca0 devm_free_netdev (4 bytes) DEVRES ADD eb7f0080 
> pcim_iomap_release (24 bytes) DEVRES ADD eb7f0000 
> devm_kzalloc_release (40 bytes)
> e1000_sw_init: Unable to allocate memory for queues DEVRES 
> REL eb7f0000 devm_kzalloc_release (40 bytes) DEVRES REL 
> eb7f0080 pcim_iomap_release (24 bytes) DEVRES REL f7a80ca0 
> devm_free_netdev (4 bytes) DEVRES REL f7a80e80 pcim_release (8 bytes)
> ACPI: PCI interrupt for device 0000:02:00.0 disabled
> e1000: probe of 0000:02:00.0 failed with error -12
> 
> Since we are returning an error from probe the driver core calls
> devres_release_all(dev) which releases all of the resources 
> in the right order.  See really_probe() in drivers/base/dd.c.

This looks fine then to me.  Thanks for the explanation.

-PJ

^ permalink raw reply

* Re: [patch 4/4] Update e1000 driver to use devres.
From: Brandon Philips @ 2007-08-16 17:05 UTC (permalink / raw)
  To: Waskiewicz Jr, Peter P; +Cc: e1000-devel, netdev, teheo
In-Reply-To: <D5C1322C3E673F459512FB59E0DDC329036D21B5@orsmsx414.amr.corp.intel.com>

On 01:38 Thu 16 Aug 2007, Waskiewicz Jr, Peter P wrote:
> > -	err = -ENOMEM;
> > -	netdev = alloc_etherdev(sizeof(struct e1000_adapter));
> > +	netdev = devm_alloc_etherdev(&pdev->dev, sizeof(struct 
> > +e1000_adapter));
> >  	if (!netdev)
> > -		goto err_alloc_etherdev;
> > +		return -ENOMEM;
> 
> I'm a bit confused why you removed the goto's, and then removed all the
> target unwinding code at the bottom of e1000_probe().   Those labels
> clean up resources if something fails, like the err_sw_init label.  I
> don't see anything in the devres code that jumps out at me that explains
> why we can do away with these cleanup routines.  Thoughts?

Have you read Documentation/driver-model/devres.txt?  That has a good
explanation.  Here is a practical explanation on how it works too.

This is the output from a normal modprobe then rmmod of e1000 with
devres debugging on.

# modprobe
DEVRES ADD f7fd6cc0 pcim_release (8 bytes)
DEVRES ADD f7a80fe0 devm_free_netdev (4 bytes) # netdev **p for free_netdev
DEVRES ADD f7dbe780 pcim_iomap_release (24 bytes) # adapter->hw.hw_addr
DEVRES ADD f7dbe9c0 devm_kzalloc_release (40 bytes) # adapter->tx_ring
DEVRES ADD f7dbe8c0 devm_kzalloc_release (44 bytes) # adapter->rx_ring

# rmmod
DEVRES REL f7dbe8c0 devm_kzalloc_release (44 bytes) # adapter->tx_ring
DEVRES REL f7dbe9c0 devm_kzalloc_release (40 bytes) # adapter->rx_ring
DEVRES REL f7dbe780 pcim_iomap_release (24 bytes) # adapter->hw.hw_addr
DEVRES REL f7a80fe0 devm_free_netdev (4 bytes) # called free_netdev
DEVRES REL f7fd6cc0 pcim_release (8 bytes)

Now if I insert a return -ENOMEM right after allocating tx_ring:
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -1356,6 +1356,8 @@ e1000_alloc_queues(struct e1000_adapter *adapter)
 {
        adapter->tx_ring = kcalloc(adapter->num_tx_queues,
                                   sizeof(struct e1000_tx_ring),
GFP_KERNEL);
+
+       return -ENOMEM;
        if (!adapter->tx_ring)
                return -ENOMEM;

#insmod
DEVRES ADD f7a80e80 pcim_release (8 bytes)
DEVRES ADD f7a80ca0 devm_free_netdev (4 bytes)
DEVRES ADD eb7f0080 pcim_iomap_release (24 bytes)
DEVRES ADD eb7f0000 devm_kzalloc_release (40 bytes)
e1000_sw_init: Unable to allocate memory for queues 
DEVRES REL eb7f0000 devm_kzalloc_release (40 bytes)
DEVRES REL eb7f0080 pcim_iomap_release (24 bytes)
DEVRES REL f7a80ca0 devm_free_netdev (4 bytes)
DEVRES REL f7a80e80 pcim_release (8 bytes)
ACPI: PCI interrupt for device 0000:02:00.0 disabled
e1000: probe of 0000:02:00.0 failed with error -12

Since we are returning an error from probe the driver core calls
devres_release_all(dev) which releases all of the resources in the right
order.  See really_probe() in drivers/base/dd.c.

SIDE NOTE
---------
I ran into a possible e1000 bug with the little -ENOMEM patch above both
with and without the devres patches.  The driver seems to leave the
EEPROM in a bad state on error because I get this error after trying to
insert the module again: 

e1000_probe: The EEPROM Checksum Is Not Valid 

A power cycle but not a reboot fixes it.

Thanks,

	Brandon

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/

^ permalink raw reply

* Re: [PATCH 2/2] [DM9000] External PHY support
From: Ben Dooks @ 2007-08-16 17:00 UTC (permalink / raw)
  To: Laurent Pinchart; +Cc: netdev, Ben Dooks
In-Reply-To: <200708161016.08180.laurentp@cse-semaphore.com>

On Thu, Aug 16, 2007 at 10:16:08AM +0200, Laurent Pinchart wrote:
> This patch adds a flag to the DM9000 platform data which, when set,
> configures the device to use an external PHY.
> 
> Signed-off-by: Laurent Pinchart <laurentp@cse-semaphore.com>
Acked-by: Ben Dooks <ben-linux@fluff.org>
> ---
>  drivers/net/dm9000.c   |    6 ++++++
>  include/linux/dm9000.h |    1 +
>  2 files changed, 7 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/net/dm9000.c b/drivers/net/dm9000.c
> index a424810..a86984e 100644
> --- a/drivers/net/dm9000.c
> +++ b/drivers/net/dm9000.c
> @@ -136,6 +136,7 @@ typedef struct board_info {
>  	u16 dbug_cnt;
>  	u8 io_mode;		/* 0:word, 2:byte */
>  	u8 phy_addr;
> +	unsigned int flags;
>  
>  	void (*inblk)(void __iomem *port, void *data, int length);
>  	void (*outblk)(void __iomem *port, void *data, int length);
> @@ -528,6 +529,8 @@ dm9000_probe(struct platform_device *pdev)
>  
>  		if (pdata->dumpblk != NULL)
>  			db->dumpblk = pdata->dumpblk;
> +
> +		db->flags = pdata->flags;
>  	}
>  
>  	dm9000_reset(db);
> @@ -670,6 +673,9 @@ dm9000_init_dm9000(struct net_device *dev)
>  	iow(db, DM9000_GPCR, GPCR_GEP_CNTL);	/* Let GPIO0 output */
>  	iow(db, DM9000_GPR, 0);	/* Enable PHY */
>  
> +	if (db->flags & DM9000_PLATF_EXT_PHY)
> +		iow(db, DM9000_NCR, NCR_EXT_PHY);
> +
>  	/* Program operating register */
>  	iow(db, DM9000_TCR, 0);	        /* TX Polling clear */
>  	iow(db, DM9000_BPTR, 0x3f);	/* Less 3Kb, 200us */
> diff --git a/include/linux/dm9000.h b/include/linux/dm9000.h
> index 0008e2a..ea530fd 100644
> --- a/include/linux/dm9000.h
> +++ b/include/linux/dm9000.h
> @@ -19,6 +19,7 @@
>  #define DM9000_PLATF_8BITONLY	(0x0001)
>  #define DM9000_PLATF_16BITONLY	(0x0002)
>  #define DM9000_PLATF_32BITONLY	(0x0004)
> +#define DM9000_PLATF_EXT_PHY	(0x0008)
>  
>  /* platfrom data for platfrom device structure's platfrom_data field */
>  
> -- 
> 1.5.0
> 
> -- 
> Laurent Pinchart
> CSE Semaphore Belgium
> 
> Chauss?e de Bruxelles, 732A
> B-1410 Waterloo
> Belgium
> 
> T +32 (2) 387 42 59
> F +32 (2) 387 42 75
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Ben (ben@fluff.org, http://www.fluff.org/)

  'a smiley only costs 4 bytes'

^ permalink raw reply

* Re: [PATCH 1/2] [DM9000] Added support for big-endian hosts
From: Ben Dooks @ 2007-08-16 17:00 UTC (permalink / raw)
  To: Laurent Pinchart; +Cc: netdev, Ben Dooks
In-Reply-To: <200708161015.35337.laurentp@cse-semaphore.com>

On Thu, Aug 16, 2007 at 10:15:35AM +0200, Laurent Pinchart wrote:
> This patch splits the receive status in 8bit wide fields and convert the
> packet length from little endian to CPU byte order.
> 
> Signed-off-by: Laurent Pinchart <laurentp@cse-semaphore.com>
Acked-by: Ben Dooks <ben-linux@fluff.org>
> ---
>  drivers/net/dm9000.c |   13 +++++++------
>  1 files changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/dm9000.c b/drivers/net/dm9000.c
> index c3de81b..a424810 100644
> --- a/drivers/net/dm9000.c
> +++ b/drivers/net/dm9000.c
> @@ -894,7 +894,8 @@ dm9000_timer(unsigned long data)
>  }
>  
>  struct dm9000_rxhdr {
> -	u16	RxStatus;
> +	u8	RxPktReady;
> +	u8	RxStatus;
>  	u16	RxLen;
>  } __attribute__((__packed__));
>  
> @@ -935,7 +936,7 @@ dm9000_rx(struct net_device *dev)
>  
>  		(db->inblk)(db->io_data, &rxhdr, sizeof(rxhdr));
>  
> -		RxLen = rxhdr.RxLen;
> +		RxLen = le16_to_cpu(rxhdr.RxLen);
>  
>  		/* Packet Status check */
>  		if (RxLen < 0x40) {
> @@ -947,17 +948,17 @@ dm9000_rx(struct net_device *dev)
>  			PRINTK1("RST: RX Len:%x\n", RxLen);
>  		}
>  
> -		if (rxhdr.RxStatus & 0xbf00) {
> +		if (rxhdr.RxStatus & 0xbf) {
>  			GoodPacket = false;
> -			if (rxhdr.RxStatus & 0x100) {
> +			if (rxhdr.RxStatus & 0x01) {
>  				PRINTK1("fifo error\n");
>  				db->stats.rx_fifo_errors++;
>  			}
> -			if (rxhdr.RxStatus & 0x200) {
> +			if (rxhdr.RxStatus & 0x02) {
>  				PRINTK1("crc error\n");
>  				db->stats.rx_crc_errors++;
>  			}
> -			if (rxhdr.RxStatus & 0x8000) {
> +			if (rxhdr.RxStatus & 0x80) {
>  				PRINTK1("length error\n");
>  				db->stats.rx_length_errors++;
>  			}
> -- 
> 1.5.0
> 
> -- 
> Laurent Pinchart
> CSE Semaphore Belgium
> 
> Chauss?e de Bruxelles, 732A
> B-1410 Waterloo
> Belgium
> 
> T +32 (2) 387 42 59
> F +32 (2) 387 42 75
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Ben (ben@fluff.org, http://www.fluff.org/)

  'a smiley only costs 4 bytes'

^ permalink raw reply

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Paul E. McKenney @ 2007-08-16 16:34 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Stefan Richter, Paul Mackerras, Satyam Sharma, Christoph Lameter,
	Chris Snook, Linux Kernel Mailing List, linux-arch,
	Linus Torvalds, netdev, Andrew Morton, ak, heiko.carstens, davem,
	schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday,
	jesper.juhl, segher
In-Reply-To: <20070816104250.GB2927@gondor.apana.org.au>

On Thu, Aug 16, 2007 at 06:42:50PM +0800, Herbert Xu wrote:
> On Thu, Aug 16, 2007 at 12:31:03PM +0200, Stefan Richter wrote:
> > 
> > PS:  Just to clarify, I'm not speaking for the volatile modifier.  I'm
> > not speaking for any particular implementation of atomic_t and its
> > accessors at all.  All I am saying is that
> >   - we use atomically accessed data types because we concurrently but
> >     locklessly access this data,
> >   - hence a read access to this data that could be optimized away
> >     makes *no sense at all*.
> 
> No sane compiler can optimise away an atomic_read per se.
> That's only possible if there's a preceding atomic_set or
> atomic_read, with no barriers in the middle.
> 
> If that's the case, then one has to conclude that doing
> away with the second read is acceptable, as otherwise
> a memory (or at least a compiler) barrier should have been
> used.

The compiler can also reorder non-volatile accesses.  For an example
patch that cares about this, please see:

	http://lkml.org/lkml/2007/8/7/280

This patch uses an ORDERED_WRT_IRQ() in rcu_read_lock() and
rcu_read_unlock() to ensure that accesses aren't reordered with respect
to interrupt handlers and NMIs/SMIs running on that same CPU.

> In fact, volatile doesn't guarantee that the memory gets
> read anyway.  You might be reading some stale value out
> of the cache.  Granted this doesn't happen on x86 but
> when you're coding for the kernel you can't make such
> assumptions.
> 
> So the point here is that if you don't mind getting a stale
> value from the CPU cache when doing an atomic_read, then
> surely you won't mind getting a stale value from the compiler
> "cache".

Absolutely disagree.  An interrupt/NMI/SMI handler running on the CPU
will see the same value (whether in cache or in store buffer) that
the mainline code will see.  In this case, we don't care about CPU
misordering, only about compiler misordering.  It is easy to see
other uses that combine communication with handlers on the current
CPU with communication among CPUs -- again, see prior messages in
this thread.

> > So, the architecture guys can implement atomic_read however they want
> > --- as long as it cannot be optimized away.*
> 
> They can implement it however they want as long as it stays
> atomic.

Precisely.  And volatility is a key property of "atomic".  Let's please
not throw it away.

						Thanx, Paul

^ permalink raw reply

* lockdep report in the bonding code.
From: Dave Jones @ 2007-08-16 16:32 UTC (permalink / raw)
  To: netdev

A Fedora users reported this against our 2.6.23-rc3 build

	Dave

NET: Registered protocol family 10
lo: Disabled Privacy Extensions
Ethernet Channel Bonding Driver: v3.1.3 (June 13, 2007)
bonding: MII link monitoring set to 100 ms
ADDRCONF(NETDEV_UP): bond0: link is not ready
bonding: bond0: Adding slave eth0.
e100: eth0: e100_watchdog: link up, 100Mbps, half-duplex
bonding: bond0: making interface eth0 the new active one.
bonding: bond0: enslaving eth0 as an active interface with an up link.
ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready

=================================
[ INFO: inconsistent lock state ]
2.6.23-0.104.rc3.fc8 #1
---------------------------------
inconsistent {softirq-on-W} -> {in-softirq-W} usage.
events/0/9 [HC0[0]:SC1[2]:HE1:SE0] takes:
 (&(bond_info->tx_hashtbl_lock)){-+..}, at: [<f8ae4cc2>] bond_alb_xmit+0x26a/0x42c [bonding]
{softirq-on-W} state was registered at:
  [<c044a6e4>] __lock_acquire+0x4ff/0xc67
  [<c044b2c6>] lock_acquire+0x7b/0x9e
  [<c062ecc8>] _spin_lock+0x2e/0x58
  [<f8ae492e>] bond_alb_initialize+0x64/0x18e [bonding]
  [<f8ae1256>] bond_open+0x33/0x175 [bonding]
  [<c05ca8aa>] dev_open+0x31/0x6c
  [<c05c8a01>] dev_change_flags+0xa3/0x156
  [<c0609231>] devinet_ioctl+0x207/0x50e
  [<c06098df>] inet_ioctl+0x86/0xa4
  [<c05bec32>] sock_ioctl+0x1ac/0x1c9
  [<c0490b76>] do_ioctl+0x22/0x68
  [<c0490e05>] vfs_ioctl+0x249/0x25c
  [<c0490e61>] sys_ioctl+0x49/0x64
  [<c040519e>] syscall_call+0x7/0xb
  [<ffffffff>] 0xffffffff
irq event stamp: 266326
hardirqs last  enabled at (266326): [<c043258f>] local_bh_enable_ip+0xf3/0x106
hardirqs last disabled at (266325): [<c0432516>] local_bh_enable_ip+0x7a/0x106
softirqs last  enabled at (266306): [<c05e2d20>] rt_run_flush+0x6e/0x97
softirqs last disabled at (266309): [<c0407548>] do_softirq+0x74/0xf7

other info that might help us debug this:
3 locks held by events/0/9:
 #0:  (rtnl_mutex){--..}, at: [<c062d8a9>] mutex_lock+0x21/0x24
 #1:  (&bond->lock){-.-+}, at: [<f8ae4a90>] bond_alb_xmit+0x38/0x42c [bonding]
 #2:  (&bond->curr_slave_lock){..-+}, at: [<f8ae4a98>] bond_alb_xmit+0x40/0x42c [bonding]

stack backtrace:
 [<c04063d8>] show_trace_log_lvl+0x1a/0x2f
 [<c0406e71>] show_trace+0x12/0x14
 [<c0406e89>] dump_stack+0x16/0x18
 [<c0448f8a>] print_usage_bug+0x141/0x14b
 [<c0449810>] mark_lock+0x12f/0x472
 [<c044a66c>] __lock_acquire+0x487/0xc67
 [<c044b2c6>] lock_acquire+0x7b/0x9e
 [<c062ecc8>] _spin_lock+0x2e/0x58
 [<f8ae4cc2>] bond_alb_xmit+0x26a/0x42c [bonding]
 [<c05c9467>] dev_hard_start_xmit+0x21c/0x27a
 [<c05cb489>] dev_queue_xmit+0x1f7/0x2aa
 [<f8b274e0>] mld_sendpack+0x210/0x35a [ipv6]
 [<f8b28535>] mld_ifc_timer_expire+0x1e9/0x211 [ipv6]
 [<c0435a5f>] run_timer_softirq+0x127/0x18f
 [<c0432708>] __do_softirq+0x78/0xff
 [<c0407548>] do_softirq+0x74/0xf7
 [<c04325e6>] irq_exit+0x44/0x46
 [<c041d3a3>] smp_apic_timer_interrupt+0x74/0x81
 [<c0405cb7>] apic_timer_interrupt+0x33/0x38
 [<c062eba6>] _spin_unlock_bh+0x25/0x28
 [<c05e2d20>] rt_run_flush+0x6e/0x97
 [<c05e471c>] rt_cache_flush+0x7c/0xaf
 [<c060e86f>] fib_netdev_event+0x62/0x68
 [<c0630e6c>] notifier_call_chain+0x2b/0x4a
 [<c04391c8>] __raw_notifier_call_chain+0x19/0x1e
 [<c04391e7>] raw_notifier_call_chain+0x1a/0x1c
 [<c05c84f3>] netdev_state_change+0x20/0x31
 [<c05d251a>] __linkwatch_run_queue+0x159/0x183
 [<c05d2564>] linkwatch_event+0x20/0x27
 [<c043bd9b>] run_workqueue+0x7d/0x129
 [<c043c75b>] worker_thread+0xbb/0xc8
 [<c043ef6f>] kthread+0x3b/0x64
 [<c0405e5b>] kernel_thread_helper+0x7/0x10
 =======================
bonding: bond0: Adding slave eth1.
e100: eth1: e100_watchdog: link up, 100Mbps, half-duplex
bonding: bond0: enslaving eth1 as an active interface with an up link.
audit(1187170179.728:3): audit_pid=1885 old=0 by auid=4294967295
bond0: no IPv6 routers present

-- 
http://www.codemonkey.org.uk

^ permalink raw reply

* Re: drivers/infiniband/mlx/mad.c misplaced ;
From: Paul Mundt @ 2007-08-16 16:19 UTC (permalink / raw)
  To: Joe Perches, Andy Whitcroft
  Cc: Dave Jones, Linux Kernel, rolandd, Chas Williams, isdn4linux,
	mikep, netdev, swen, linux390, linux-s390, jdike,
	user-mode-linux-devel, user-mode-linux-user, netfilter-devel,
	coreteam, Randy Dunlap, Andrew Morton
In-Reply-To: <1187224811.5906.55.camel@localhost>

On Wed, Aug 15, 2007 at 05:40:11PM -0700, Joe Perches wrote:
> $ egrep -r --include=*.c "\bif[[:space:]]*\([^\)]*\)[[:space:]]*\;" * 
> arch/sh/boards/se/7343/io.c:    if (0) ;
> drivers/atm/iphase.c:     if (!desc1) ;
> drivers/infiniband/hw/mlx4/mad.c:       if (!err);
> drivers/isdn/capi/capiutil.c:   else if (c <= 0x0f);
> drivers/net/tokenring/ibmtr.c:  else if (ti->shared_ram_paging == 0xf);  /* No paging in adapter */
> drivers/s390/scsi/zfcp_erp.c:           if (status == ZFCP_ERP_SUCCEEDED) ;     /* no further action */
> fs/hostfs/hostfs_user.c:        if(attrs->ia_valid & HOSTFS_ATTR_CTIME) ;
> net/netfilter/xt_u32.c:         if (skb->len < 4 || pos > skb->len - 4);
> sound/pci/au88x0/au88x0_synth.c:                                if (eax == 0) ;

On Thu, Aug 16, 2007 at 02:18:40PM +0100, Andy Whitcroft wrote:
> A couple of people suggested adding checks to checkpatch for trailing
> semicolons on conditionals, where the conditional block may not be
> actually conditional:
> 
> 	if (err);
> 		return err;
> 
> While regression testing the changes, I ran these checks across the
> whole of 2.6.23-rc3 and there appear to be 5 places where this is
> occurs (above and beyond the IPv6 one which triggered this effort)
> and a benign use which could be confused later which it seems safest
> to fix.
> 
> Following this email are 6 patches for these issues, relevant
> maintainers cc'd.  All against 2.6.23-rc3
> 
It looks like you may want to refine your search parameters to match the
above, as there are at least a few cases where the space exists.

^ permalink raw reply

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Stefan Richter @ 2007-08-16 16:19 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Herbert Xu, Paul Mackerras, Satyam Sharma, Christoph Lameter,
	Paul E. McKenney, Chris Snook, Linux Kernel Mailing List,
	linux-arch, Linus Torvalds, Netdev, Andrew Morton, ak,
	heiko.carstens, David Miller, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher
In-Reply-To: <Pine.LNX.4.64.0708161743500.13267@kivilampi-30.cs.helsinki.fi>

Ilpo Järvinen wrote:
> I looked around a bit by using some command lines and ended up wondering 
> if these are equal to busy-wait case (and should be fixed) or not:
> 
> ./drivers/telephony/ixj.c
> 6674:   while (atomic_read(&j->DSPWrite) > 0)
> 6675-           atomic_dec(&j->DSPWrite);
> 
> ...besides that, there are couple of more similar cases in the same file 
> (with braces)...

Generally, ixj.c has several occurrences of couples of atomic write and
atomic read which potentially do not do what the author wanted.
-- 
Stefan Richter
-=====-=-=== =--- =----
http://arcgraph.de/sr/

^ permalink raw reply

* Re: [PATCH] lockdep: annotate rcu_read_{,un}lock()
From: Paul E. McKenney @ 2007-08-16 16:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, herbert, 123.oleg, netdev, linux-kernel,
	David Miller, Daniel Walker, josht
In-Reply-To: <1187274307.6114.92.camel@twins>

On Thu, Aug 16, 2007 at 04:25:07PM +0200, Peter Zijlstra wrote:
> 
> There seem to be some unbalanced rcu_read_{,un}lock() issues of late,
> how about doing something like this:

This will break when rcu_read_lock() and rcu_read_unlock() are invoked
from NMI/SMI handlers -- the raw_local_irq_save() in lock_acquire() will
not mask NMIs or SMIs.

One approach would be to check for being in an NMI/SMI handler, and
to avoid calling lock_acquire() and lock_release() in those cases.
Another approach would be to use sparse, which has checks for
rcu_read_lock()/rcu_read_unlock() nesting.

							Thanx, Paul

> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
>  include/linux/lockdep.h  |    7 +++++++
>  include/linux/rcupdate.h |   12 ++++++++++++
>  kernel/rcupdate.c        |    8 ++++++++
>  3 files changed, 27 insertions(+)
> 
> Index: linux-2.6/include/linux/lockdep.h
> ===================================================================
> --- linux-2.6.orig/include/linux/lockdep.h
> +++ linux-2.6/include/linux/lockdep.h
> @@ -252,6 +252,13 @@ extern void lockdep_init_map(struct lock
>  			     struct lock_class_key *key, int subclass);
> 
>  /*
> + * To initialize a lockdep_map statically use this macro.
> + * Note that _name must not be NULL.
> + */
> +#define STATIC_LOCKDEP_MAP_INIT(_name, _key) \
> +	{ .name = (_name), .key = (void *)(_key), }
> +
> +/*
>   * Reinitialize a lock key - for cases where there is special locking or
>   * special initialization of locks so that the validator gets the scope
>   * of dependencies wrong: they are either too broad (they need a class-split)
> Index: linux-2.6/include/linux/rcupdate.h
> ===================================================================
> --- linux-2.6.orig/include/linux/rcupdate.h
> +++ linux-2.6/include/linux/rcupdate.h
> @@ -41,6 +41,7 @@
>  #include <linux/percpu.h>
>  #include <linux/cpumask.h>
>  #include <linux/seqlock.h>
> +#include <linux/lockdep.h>
> 
>  /**
>   * struct rcu_head - callback structure for use with RCU
> @@ -133,6 +134,15 @@ static inline void rcu_bh_qsctr_inc(int 
>  extern int rcu_pending(int cpu);
>  extern int rcu_needs_cpu(int cpu);
> 
> +#ifdef CONFIG_DEBUG_LOCK_ALLOC
> +extern struct lockdep_map rcu_lock_map;
> +# define rcu_read_acquire()	lock_acquire(&rcu_lock_map, 0, 0, 2, 1, _THIS_IP_)
> +# define rcu_read_release()	lock_release(&rcu_lock_map, 1, _THIS_IP_)
> +#else
> +# define rcu_read_acquire()	do { } while (0)
> +# define rcu_read_release()	do { } while (0)
> +#endif
> +
>  /**
>   * rcu_read_lock - mark the beginning of an RCU read-side critical section.
>   *
> @@ -166,6 +176,7 @@ extern int rcu_needs_cpu(int cpu);
>  	do { \
>  		preempt_disable(); \
>  		__acquire(RCU); \
> +		rcu_read_acquire(); \
>  	} while(0)
> 
>  /**
> @@ -175,6 +186,7 @@ extern int rcu_needs_cpu(int cpu);
>   */
>  #define rcu_read_unlock() \
>  	do { \
> +		rcu_read_release(); \
>  		__release(RCU); \
>  		preempt_enable(); \
>  	} while(0)
> Index: linux-2.6/kernel/rcupdate.c
> ===================================================================
> --- linux-2.6.orig/kernel/rcupdate.c
> +++ linux-2.6/kernel/rcupdate.c
> @@ -49,6 +49,14 @@
>  #include <linux/cpu.h>
>  #include <linux/mutex.h>
> 
> +#ifdef CONFIG_DEBUG_LOCK_ALLOC
> +static struct lock_class_key rcu_lock_key;
> +struct lockdep_map rcu_lock_map =
> +	STATIC_LOCKDEP_MAP_INIT("rcu_read_lock", &rcu_lock_key);
> +
> +EXPORT_SYMBOL_GPL(rcu_lock_map);
> +#endif
> +
>  /* Definition for rcupdate control block. */
>  static struct rcu_ctrlblk rcu_ctrlblk = {
>  	.cur = -300,
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [GENETLINK]: Question: global lock (genl_mutex) possible refinement?
From: Thomas Graf @ 2007-08-16 15:58 UTC (permalink / raw)
  To: Richard MUSIL; +Cc: Patrick McHardy, netdev
In-Reply-To: <46A5DDDD.6070602@st.com>

* Richard MUSIL <richard.musil@st.com> 2007-07-24 13:09
> Thomas Graf wrote:
> > Please provide a new overall patch which is not based on your
> > initial patch so I can review your idea properly.
> 
> Here it goes (merging two previous patches). I have diffed
> against v2.6.22, which I am using currently as my base:

Sorry for taking so long.

> @@ -150,9 +176,9 @@ int genl_register_ops(struct genl_family *family, struct genl_ops *ops)
>  	if (ops->policy)
>  		ops->flags |= GENL_CMD_CAP_HASPOL;
>  
> -	genl_lock();
> +	genl_fam_lock(family);
>  	list_add_tail(&ops->ops_list, &family->ops_list);
> -	genl_unlock();
> +	genl_fam_unlock(family);

For registering operations, it is sufficient to just acquire the
family lock, the family itself can't disappear while holding it.

> @@ -216,8 +242,9 @@ int genl_register_family(struct genl_family *family)
>  		goto errout;
>  
>  	INIT_LIST_HEAD(&family->ops_list);
> +	mutex_init(&family->lock);
>  
> -	genl_lock();
> +	genl_fam_lock(family);
>  
>  	if (genl_family_find_byname(family->name)) {
>  		err = -EEXIST;
> @@ -251,14 +278,14 @@ int genl_register_family(struct genl_family *family)
>  		family->attrbuf = NULL;
>  
>  	list_add_tail(&family->family_list, genl_family_chain(family->id));
> -	genl_unlock();
> +	genl_fam_unlock(family);

This looks good.

> @@ -303,38 +332,57 @@ static int genl_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
>  	struct genlmsghdr *hdr = nlmsg_data(nlh);
>  	int hdrlen, err;
>  
> +	genl_fam_lock(NULL);
>  	family = genl_family_find_byid(nlh->nlmsg_type);
> -	if (family == NULL)
> +	if (family == NULL) {
> +		genl_fam_unlock(NULL);
>  		return -ENOENT;
> +	}
> +
> +	/* get particular family lock, but release global family lock
> +	 * so registering operations for other families are possible */
> +	genl_onefam_lock(family);
> +	genl_fam_unlock(NULL);

I don't like having two locks for something as trivial as this.
Basically the only reason the global lock is required here is to
protect from family removal which can be avoided otherwise by
using RCU list operations.

Therefore, I'd propose the following lock semantics:
Use own global mutex to protect writing to the family list, make
reading side lockless using rcu for use when looking up family
upon mesage processing. Use a family lock to protect writing to
operations list and serialize messae processing with unregister
operations.

^ permalink raw reply

* Re: [patch 2/4] Update e100 driver to use devres.
From: Brandon Philips @ 2007-08-16 15:54 UTC (permalink / raw)
  To: Tejun Heo; +Cc: e1000-devel, netdev
In-Reply-To: <46C43652.6030609@suse.de>

On 20:34 Thu 16 Aug 2007, Tejun Heo wrote:
> Brandon Philips wrote:
> > @@ -2554,8 +2547,10 @@ static int __devinit e100_probe(struct p
> >  	struct net_device *netdev;
> >  	struct nic *nic;
> >  	int err;
> > +	void __iomem **iomap;
> > +	int bar;
> >  
> > -	if(!(netdev = alloc_etherdev(sizeof(struct nic)))) {
> > +	if (!(netdev = devm_alloc_etherdev(&pdev->dev, sizeof(struct nic)))) {
> >  		if(((1 << debug) - 1) & NETIF_MSG_PROBE)
> >  			printk(KERN_ERR PFX "Etherdev alloc failed, abort.\n");
> >  		return -ENOMEM;
> > @@ -2585,26 +2580,28 @@ static int __devinit e100_probe(struct p
> >  	nic->msg_enable = (1 << debug) - 1;
> >  	pci_set_drvdata(pdev, netdev);
> >  
> > -	if((err = pci_enable_device(pdev))) {
> > +	if ((err = pcim_enable_device(pdev))) {
> >  		DPRINTK(PROBE, ERR, "Cannot enable PCI device, aborting.\n");
> > -		goto err_out_free_dev;
> > +		return err;
> >  	}
> >  
> >  	if(!(pci_resource_flags(pdev, 0) & IORESOURCE_MEM)) {
> >  		DPRINTK(PROBE, ERR, "Cannot find proper PCI device "
> >  			"base address, aborting.\n");
> >  		err = -ENODEV;
> > -		goto err_out_disable_pdev;
> > +		return err;
> >  	}
> >  
> > -	if((err = pci_request_regions(pdev, DRV_NAME))) {
> > +	bar = use_io ? 1 : 0;
> > +	if((err = pcim_iomap_regions(pdev, 1 << bar, DRV_NAME))) {
> >  		DPRINTK(PROBE, ERR, "Cannot obtain PCI resources, aborting.\n");
> > -		goto err_out_disable_pdev;
> > +		return err;
> >  	}
> > +	iomap = pcim_iomap_table(pdev)[bar];
> 
> Type mismatch.  Didn't compiler warn about it?

s/iomap/nic->csr/

> >  
> >  	if((err = pci_set_dma_mask(pdev, DMA_32BIT_MASK))) {
> >  		DPRINTK(PROBE, ERR, "No usable DMA configuration, aborting.\n");
> > -		goto err_out_free_res;
> > +		return err;
> >  	}
> >  
> >  	SET_MODULE_OWNER(netdev);
> > @@ -2613,12 +2610,6 @@ static int __devinit e100_probe(struct p
> >  	if (use_io)
> >  		DPRINTK(PROBE, INFO, "using i/o access mode\n");
> >  
> > -	nic->csr = pci_iomap(pdev, (use_io ? 1 : 0), sizeof(struct csr));
> > -	if(!nic->csr) {
> > -		DPRINTK(PROBE, ERR, "Cannot map device registers, aborting.\n");
> > -		err = -ENOMEM;
> > -		goto err_out_free_res;
> > -	}
> 
> nic->csr initialization seems missing.  Have you tested the patched code?

No I didn't test it; my e100 box gave up on me.  I shouldn't have sent
this patch since I wasn't able to get it tested.

Thanks,

	Brandon

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/

^ permalink raw reply

* Re: [RFC] net/core/dst.c : Should'nt dst_run_gc() be more scalable and friendly ?
From: Eric Dumazet @ 2007-08-16 15:40 UTC (permalink / raw)
  To: Herbert Xu; +Cc: netdev
In-Reply-To: <E1ILft1-0001Db-00@gondolin.me.apana.org.au>

On Thu, 16 Aug 2007 21:59:43 +0800
Herbert Xu <herbert@gondor.apana.org.au> wrote:

> Eric Dumazet <dada1@cosmosbay.com> wrote:
> > 
> > Yes, I already did this (with the current softirq based timer model),
> > but how can dst_dev_event() do its work, since the GC is using 
> > a private list. (In my patch, time to GC process XXX.000 entries is about XX seconds.)
> > 
> > We would have to change dst_dev_event() to :
> > - Signal to GC it has to stop as soon as possible.
> > - Wait for GC be stoped (busy wait I suspect we cannot sleep in dst_dev_event() ? )
> 
> You can sleep in dst_dev_event so just use a mutex to separate
> it from the GC.

Thanks for the suggestion, I added dst_mutex and attempt a mutex_trylock() inside dst_run_gc().

So do you think this patch is enough or should we convert dst_run_gc processing from softirq to workqueue too ?

Eric

[PATCH] net/core/dst.c : let dst_run_gc() do its job incrementally

When the periodic route cache flush is done, we can feed dst_garbage_list with a huge number of dst_entry structs.

Then dst_run_gc() is doing a scan of a possibly huge list of dst_entries, eventually freeing some (less than 1%) of them, while holding the dst_lock spinlock for the whole scan. This sometimes triggers a soft lockup. 

Then it rearms a timer to redo the full thing 1/10 s later. The slowdown can last one minute or so.
No need to say that machine is not really usable in the meantime.

This patch attempts to solve the problem giving dst_run_gc() the ability to perform its work by chunks instead of whole list scan. We limit a chunk by the number of entries that could not be freed in a run. This should limit the CPU cache trashing (currently 128 bytes per entry on x86_64), yet giving a chance to free unreferenced entries (not included in the quota) if the load is really high.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>

--- linux-2.6.22/net/core/dst.c.orig
+++ linux-2.6.22/net/core/dst.c
@@ -9,6 +9,7 @@
 #include <linux/errno.h>
 #include <linux/init.h>
 #include <linux/kernel.h>
+#include <linux/mutex.h>
 #include <linux/mm.h>
 #include <linux/module.h>
 #include <linux/netdevice.h>
@@ -28,10 +29,17 @@
  * 4) All operations modify state, so a spinlock is used.
  */
 static struct dst_entry 	*dst_garbage_list;
+static struct dst_entry     *dst_garbage_wrk;
+static struct dst_entry     *dst_garbage_inuse;
+static int dst_gc_running;
 #if RT_CACHE_DEBUG >= 2
 static atomic_t			 dst_total = ATOMIC_INIT(0);
 #endif
 static DEFINE_SPINLOCK(dst_lock);
+/*
+ * A mutex is used to synchronize GC with dst_dev_event()
+ */
+static DEFINE_MUTEX(dst_mutex);
 
 static unsigned long dst_gc_timer_expires;
 static unsigned long dst_gc_timer_inc = DST_GC_MAX;
@@ -42,26 +50,39 @@ static DEFINE_TIMER(dst_gc_timer, dst_ru
 
 static void dst_run_gc(unsigned long dummy)
 {
-	int    delayed = 0;
-	int    work_performed;
-	struct dst_entry * dst, **dstp;
+	int    quota = 1000;
+	int    work_performed = 0;
+	struct dst_entry *dst, *next;
+	struct dst_entry *first = NULL, *last = NULL;
 
-	if (!spin_trylock(&dst_lock)) {
+	if (!mutex_trylock(&dst_mutex)) {
 		mod_timer(&dst_gc_timer, jiffies + HZ/10);
 		return;
 	}
+	dst_gc_running = 1;
 
+	spin_lock(&dst_lock);
 	del_timer(&dst_gc_timer);
-	dstp = &dst_garbage_list;
-	work_performed = 0;
-	while ((dst = *dstp) != NULL) {
-		if (atomic_read(&dst->__refcnt)) {
-			dstp = &dst->next;
-			delayed++;
+	if (!dst_garbage_wrk) {
+		dst_garbage_wrk = dst_garbage_list;
+		dst_garbage_list = dst_garbage_inuse;
+		dst_garbage_inuse = NULL;
+	}
+	next = dst_garbage_wrk;
+	spin_unlock(&dst_lock);
+
+	while ((dst = next) != NULL) {
+		next = dst->next;
+		if (likely(atomic_read(&dst->__refcnt))) {
+			if (unlikely(last == NULL))
+				last = dst;
+			dst->next = first;
+			first = dst;
+			if (--quota == 0)
+				break;
 			continue;
 		}
-		*dstp = dst->next;
-		work_performed = 1;
+		work_performed++;
 
 		dst = dst_destroy(dst);
 		if (dst) {
@@ -77,16 +98,26 @@ static void dst_run_gc(unsigned long dum
 				continue;
 
 			___dst_free(dst);
-			dst->next = *dstp;
-			*dstp = dst;
-			dstp = &dst->next;
+			dst->next = next;
+			next = dst;
 		}
 	}
-	if (!dst_garbage_list) {
+	dst_garbage_wrk = next;
+
+	spin_lock(&dst_lock);
+	dst_gc_running = 0;
+	if (last != NULL) {
+		last->next = dst_garbage_inuse;
+		dst_garbage_inuse = first;
+	}
+	/*
+	 * If all lists are empty, no need to rearm a timer
+	 */
+	if (!dst_garbage_list && !dst_garbage_wrk && !dst_garbage_inuse) {
 		dst_gc_timer_inc = DST_GC_MAX;
 		goto out;
 	}
-	if (!work_performed) {
+	if (!work_performed && !dst_garbage_wrk) {
 		if ((dst_gc_timer_expires += dst_gc_timer_inc) > DST_GC_MAX)
 			dst_gc_timer_expires = DST_GC_MAX;
 		dst_gc_timer_inc += DST_GC_INC;
@@ -95,8 +126,11 @@ static void dst_run_gc(unsigned long dum
 		dst_gc_timer_expires = DST_GC_MIN;
 	}
 #if RT_CACHE_DEBUG >= 2
-	printk("dst_total: %d/%d %ld\n",
-	       atomic_read(&dst_total), delayed,  dst_gc_timer_expires);
+	if (net_msg_warn)
+		printk(KERN_DEBUG "dst_run_gc: "
+		"dst_total=%d quota=%d perf=%d expires=%ld\n",
+		atomic_read(&dst_total), quota,
+		work_performed, dst_gc_timer_expires);
 #endif
 	/* if the next desired timer is more than 4 seconds in the future
 	 * then round the timer to whole seconds
@@ -109,6 +143,7 @@ static void dst_run_gc(unsigned long dum
 
 out:
 	spin_unlock(&dst_lock);
+	mutex_unlock(&dst_mutex);
 }
 
 static int dst_discard(struct sk_buff *skb)
@@ -157,7 +192,7 @@ void __dst_free(struct dst_entry * dst)
 	___dst_free(dst);
 	dst->next = dst_garbage_list;
 	dst_garbage_list = dst;
-	if (dst_gc_timer_inc > DST_GC_INC) {
+	if (dst_gc_timer_inc > DST_GC_INC && !dst_gc_running) {
 		dst_gc_timer_inc = DST_GC_INC;
 		dst_gc_timer_expires = DST_GC_MIN;
 		mod_timer(&dst_gc_timer, jiffies + dst_gc_timer_expires);
@@ -224,7 +259,7 @@ again:
  *
  * Commented and originally written by Alexey.
  */
-static inline void dst_ifdown(struct dst_entry *dst, struct net_device *dev,
+static void dst_ifdown(struct dst_entry *dst, struct net_device *dev,
 			      int unregister)
 {
 	if (dst->ops->ifdown)
@@ -251,15 +286,34 @@ static int dst_dev_event(struct notifier
 {
 	struct net_device *dev = ptr;
 	struct dst_entry *dst;
+	struct dst_entry *first, *last;
 
 	switch (event) {
 	case NETDEV_UNREGISTER:
 	case NETDEV_DOWN:
+		mutex_lock(&dst_mutex); /* guard against GC */
+
 		spin_lock_bh(&dst_lock);
-		for (dst = dst_garbage_list; dst; dst = dst->next) {
-			dst_ifdown(dst, dev, event != NETDEV_DOWN);
+		first = dst_garbage_list;
+		if (first) {
+			dst_garbage_list = NULL;
+			spin_unlock_bh(&dst_lock);
+			for (dst = first; dst; dst = dst->next) {
+				last = dst;
+				dst_ifdown(dst, dev, event != NETDEV_DOWN);
+			}
+			spin_lock_bh(&dst_lock);
+			last->next = dst_garbage_list;
+			dst_garbage_list = first;
 		}
 		spin_unlock_bh(&dst_lock);
+		for (dst = dst_garbage_wrk; dst; dst = dst->next) {
+			dst_ifdown(dst, dev, event != NETDEV_DOWN);
+		}
+		for (dst = dst_garbage_inuse; dst; dst = dst->next) {
+			dst_ifdown(dst, dev, event != NETDEV_DOWN);
+		}
+		mutex_unlock(&dst_mutex);
 		break;
 	}
 	return NOTIFY_DONE;

^ permalink raw reply

* Re: [PATCH] nl80211: remove VLAN type
From: Johannes Berg @ 2007-08-16 15:03 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, John W. Linville, linux-wireless
In-Reply-To: <1187169518.3960.0.camel-YfaajirXv214zXjbi5bjpg@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 385 bytes --]

On Wed, 2007-08-15 at 11:18 +0200, Johannes Berg wrote:
> There is no point in trying to handle source MAC address based VLANs in
> the wireless stack, something like "smacvlan" modeled after macvlan
> should be sufficient.

Whoops. I only forgot the use with encryption. Please consider this
patch withdrawn, at least until I see clearer through that crypto issue.

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply

* Re: drivers/infiniband/mlx/mad.c misplaced ;
From: Jeff Dike @ 2007-08-16 14:52 UTC (permalink / raw)
  To: Joe Perches
  Cc: Dave Jones, Linux Kernel, rolandd, Chas Williams, Paul Mundt,
	isdn4linux, mikep, netdev, swen, linux390, linux-s390,
	user-mode-linux-devel, user-mode-linux-user, netfilter-devel,
	coreteam
In-Reply-To: <1187224811.5906.55.camel@localhost>

On Wed, Aug 15, 2007 at 05:40:11PM -0700, Joe Perches wrote:
> fs/hostfs/hostfs_user.c:        if(attrs->ia_valid & HOSTFS_ATTR_CTIME) ;

This one can be deleted, but I think I did it for documentation
reasons, to make it clear that ctime handling wasn't left out by
mistake.

				Jeff

-- 
Work email - jdike at linux dot intel dot com

^ permalink raw reply

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Ilpo Järvinen @ 2007-08-16 14:48 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Paul Mackerras, Satyam Sharma, Christoph Lameter,
	Paul E. McKenney, Stefan Richter, Chris Snook,
	Linux Kernel Mailing List, linux-arch, Linus Torvalds, Netdev,
	Andrew Morton, ak, heiko.carstens, David Miller, schwidefsky,
	wensong, horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl,
	segher
In-Reply-To: <20070816070907.GA964@gondor.apana.org.au>

On Thu, 16 Aug 2007, Herbert Xu wrote:

> We've been through that already.  If it's a busy-wait it
> should use cpu_relax. 

I looked around a bit by using some command lines and ended up wondering 
if these are equal to busy-wait case (and should be fixed) or not:

./drivers/telephony/ixj.c
6674:   while (atomic_read(&j->DSPWrite) > 0)
6675-           atomic_dec(&j->DSPWrite);

...besides that, there are couple of more similar cases in the same file 
(with braces)...


-- 
 i.

^ permalink raw reply

* [PATCH] lockdep: annotate rcu_read_{,un}lock()
From: Peter Zijlstra @ 2007-08-16 14:25 UTC (permalink / raw)
  To: Paul E McKenney, Ingo Molnar
  Cc: herbert, 123.oleg, netdev, linux-kernel, David Miller,
	Daniel Walker
In-Reply-To: <20070815.144628.104052147.davem@davemloft.net>


There seem to be some unbalanced rcu_read_{,un}lock() issues of late,
how about doing something like this:

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 include/linux/lockdep.h  |    7 +++++++
 include/linux/rcupdate.h |   12 ++++++++++++
 kernel/rcupdate.c        |    8 ++++++++
 3 files changed, 27 insertions(+)

Index: linux-2.6/include/linux/lockdep.h
===================================================================
--- linux-2.6.orig/include/linux/lockdep.h
+++ linux-2.6/include/linux/lockdep.h
@@ -252,6 +252,13 @@ extern void lockdep_init_map(struct lock
 			     struct lock_class_key *key, int subclass);
 
 /*
+ * To initialize a lockdep_map statically use this macro.
+ * Note that _name must not be NULL.
+ */
+#define STATIC_LOCKDEP_MAP_INIT(_name, _key) \
+	{ .name = (_name), .key = (void *)(_key), }
+
+/*
  * Reinitialize a lock key - for cases where there is special locking or
  * special initialization of locks so that the validator gets the scope
  * of dependencies wrong: they are either too broad (they need a class-split)
Index: linux-2.6/include/linux/rcupdate.h
===================================================================
--- linux-2.6.orig/include/linux/rcupdate.h
+++ linux-2.6/include/linux/rcupdate.h
@@ -41,6 +41,7 @@
 #include <linux/percpu.h>
 #include <linux/cpumask.h>
 #include <linux/seqlock.h>
+#include <linux/lockdep.h>
 
 /**
  * struct rcu_head - callback structure for use with RCU
@@ -133,6 +134,15 @@ static inline void rcu_bh_qsctr_inc(int 
 extern int rcu_pending(int cpu);
 extern int rcu_needs_cpu(int cpu);
 
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+extern struct lockdep_map rcu_lock_map;
+# define rcu_read_acquire()	lock_acquire(&rcu_lock_map, 0, 0, 2, 1, _THIS_IP_)
+# define rcu_read_release()	lock_release(&rcu_lock_map, 1, _THIS_IP_)
+#else
+# define rcu_read_acquire()	do { } while (0)
+# define rcu_read_release()	do { } while (0)
+#endif
+
 /**
  * rcu_read_lock - mark the beginning of an RCU read-side critical section.
  *
@@ -166,6 +176,7 @@ extern int rcu_needs_cpu(int cpu);
 	do { \
 		preempt_disable(); \
 		__acquire(RCU); \
+		rcu_read_acquire(); \
 	} while(0)
 
 /**
@@ -175,6 +186,7 @@ extern int rcu_needs_cpu(int cpu);
  */
 #define rcu_read_unlock() \
 	do { \
+		rcu_read_release(); \
 		__release(RCU); \
 		preempt_enable(); \
 	} while(0)
Index: linux-2.6/kernel/rcupdate.c
===================================================================
--- linux-2.6.orig/kernel/rcupdate.c
+++ linux-2.6/kernel/rcupdate.c
@@ -49,6 +49,14 @@
 #include <linux/cpu.h>
 #include <linux/mutex.h>
 
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+static struct lock_class_key rcu_lock_key;
+struct lockdep_map rcu_lock_map =
+	STATIC_LOCKDEP_MAP_INIT("rcu_read_lock", &rcu_lock_key);
+
+EXPORT_SYMBOL_GPL(rcu_lock_map);
+#endif
+
 /* Definition for rcupdate control block. */
 static struct rcu_ctrlblk rcu_ctrlblk = {
 	.cur = -300,



^ permalink raw reply

* Re: [RFC] net/core/dst.c : Should'nt dst_run_gc() be more scalable and friendly ?
From: Herbert Xu @ 2007-08-16 13:59 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: herbert, netdev
In-Reply-To: <20070816153619.2c699e21.dada1@cosmosbay.com>

Eric Dumazet <dada1@cosmosbay.com> wrote:
> 
> Yes, I already did this (with the current softirq based timer model),
> but how can dst_dev_event() do its work, since the GC is using 
> a private list. (In my patch, time to GC process XXX.000 entries is about XX seconds.)
> 
> We would have to change dst_dev_event() to :
> - Signal to GC it has to stop as soon as possible.
> - Wait for GC be stoped (busy wait I suspect we cannot sleep in dst_dev_event() ? )

You can sleep in dst_dev_event so just use a mutex to separate
it from the GC.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: PROBLEM: 2.6.23-rc "NETDEV WATCHDOG: eth0: transmit timed out"
From: Francois Romieu @ 2007-08-16 14:11 UTC (permalink / raw)
  To: Karl Meyer; +Cc: linux-kernel, netdev
In-Reply-To: <20070814224629.GB3495@electric-eye.fr.zoreil.com>

[-- Attachment #1: Type: text/plain, Size: 524 bytes --]

(please do not remove the netdev Cc:)

Francois Romieu <romieu@fr.zoreil.com> :
[...]
> If it does not work I'll dissect 0e4851502f846b13b29b7f88f1250c980d57e944
> tomorrow.

You will find a tgz archive in attachment which contains a serie of patches
(0001-... to 0005-...) to walk from 6dccd16b7c2703e8bbf8bca62b5cf248332afbe2
to 0e4851502f846b13b29b7f88f1250c980d57e944 in smaller steps.

Please apply 0001 on top of 6dccd16b7c2703e8bbf8bca62b5cf248332afbe2. If it
still works, apply 0002 on top of 0001, etc.

-- 
Ueimor

[-- Attachment #2: r8169-meyer.tgz --]
[-- Type: application/x-gzip, Size: 3538 bytes --]

^ permalink raw reply

* Re: drivers/infiniband/mlx/mad.c misplaced ;
From: Andy Whitcroft @ 2007-08-16 14:05 UTC (permalink / raw)
  To: Kok, Auke
  Cc: Joe Perches, Randy.Dunlap, Joel Schopp, Dave Jones, Linux Kernel,
	rolandd, Chas Williams, Paul Mundt, isdn4linux, mikep, netdev,
	swen, linux390, linux-s390, jdike, user-mode-linux-devel,
	user-mode-linux-user, netfilter-devel, coreteam
In-Reply-To: <46C3B428.4000003@intel.com>

Kok, Auke wrote:
> Joe Perches wrote:
>> On Wed, 2007-08-15 at 19:58 -0400, Dave Jones wrote:
>>> Signed-off-by: Dave Jones <davej@redhat.com>
>>>
>>> diff --git a/drivers/infiniband/hw/mlx4/mad.c
>>> b/drivers/infiniband/hw/mlx4/mad.c
>>> index 3330917..0ed02b7 100644
>>> --- a/drivers/infiniband/hw/mlx4/mad.c
>>> +++ b/drivers/infiniband/hw/mlx4/mad.c
>>> @@ -109,7 +109,7 @@ int mlx4_MAD_IFC(struct mlx4_ib_dev *dev, int
>>> ignore_mkey, int ignore_bkey,
>>>                 in_modifier, op_modifier,
>>>                 MLX4_CMD_MAD_IFC, MLX4_CMD_TIME_CLASS_C);
>>>  
>>> -    if (!err);
>>> +    if (!err)
>>
>> There's more than a few of these (not inspected).
>>
>> $ egrep -r --include=*.c "\bif[[:space:]]*\([^\)]*\)[[:space:]]*\;" *
>> arch/sh/boards/se/7343/io.c:    if (0) ;
>> drivers/atm/iphase.c:     if (!desc1) ;
>> drivers/infiniband/hw/mlx4/mad.c:       if (!err);
>> drivers/isdn/capi/capiutil.c:   else if (c <= 0x0f);
>> drivers/net/tokenring/ibmtr.c:  else if (ti->shared_ram_paging ==
>> 0xf);  /* No paging in adapter */
>> drivers/s390/scsi/zfcp_erp.c:           if (status ==
>> ZFCP_ERP_SUCCEEDED) ;     /* no further action */
>> fs/hostfs/hostfs_user.c:        if(attrs->ia_valid & HOSTFS_ATTR_CTIME) ;
>> net/netfilter/xt_u32.c:         if (skb->len < 4 || pos > skb->len - 4);
>> sound/pci/au88x0/au88x0_synth.c:                                if
>> (eax == 0) ;
> 
> sounds like an excellent candidate check for checkpatch.pl !!!

Yep.  My scan of 2.6.23-rc3 with a checkpatch with this new test added,
found 6 which seemed wrong.

-apw

^ permalink raw reply

* [PATCH] use correct array index. (array is just 6 bytes long)
From: Marcus Meissner @ 2007-08-16 14:06 UTC (permalink / raw)
  To: netdev; +Cc: linux-tr, mikep

From: Marcus Meissner <meissner@suse.de>

Use correct array index (goes from 0-6 instead of 10-16).

Signed-Off-By: Marcus Meissner <meissner@suse.de>
---
 drivers/net/tokenring/3c359.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/tokenring/3c359.c b/drivers/net/tokenring/3c359.c
index 9f1b6ab..5e2437e 100644
--- a/drivers/net/tokenring/3c359.c
+++ b/drivers/net/tokenring/3c359.c
@@ -763,7 +763,7 @@ static int xl_open_hw(struct net_device
 	if (xl_priv->xl_laa[0]) {  /* If using a LAA address */
 		for (i=10;i<16;i++) { 
 			writel( (MEM_BYTE_WRITE | 0xD0000 | xl_priv->srb) + i, xl_mmio + MMIO_MAC_ACCESS_CMD) ; 
-			writeb(xl_priv->xl_laa[i],xl_mmio + MMIO_MACDATA) ; 
+			writeb(xl_priv->xl_laa[i-10],xl_mmio + MMIO_MACDATA) ;
 		}
 		memcpy(dev->dev_addr,xl_priv->xl_laa,dev->addr_len) ; 
 	} else { /* Regular hardware address */ 
-- 
1.4.3.4

^ permalink raw reply related

* Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
From: Tom Tucker @ 2007-08-16 13:43 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: netdev, rdreier, linux-kernel, general, David Miller
In-Reply-To: <46C3B5EF.5060409@garzik.org>

On Wed, 2007-08-15 at 22:26 -0400, Jeff Garzik wrote:

[...snip...]

> > I think removing the RDMA stack is the wrong thing to do, and you 
> > shouldn't just threaten to yank entire subsystems because you don't like 
> > the technology.  Lets keep this constructive, can we?  RDMA should get 
> > the respect of any other technology in Linux.  Maybe its a niche in your 
> > opinion, but come on, there's more RDMA users than say, the sparc64 
> > port.  Eh?
> 
> It's not about being a niche.  It's about creating a maintainable 
> software net stack that has predictable behavior.

Isn't RDMA _part_ of the "software net stack" within Linux? Why isn't
making RDMA stable, supportable and maintainable equally as important as
any other subsystem? 

> 
> Needing to reach out of the RDMA sandbox and reserve net stack resources 
> away from itself travels a path we've consistently avoided.
> 
> 
> >> I will NACK any patch that opens up sockets to eat up ports or
> >> anything stupid like that.
> > 
> > Got it.
> 
> Ditto for me as well.
> 
> 	Jeff
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [RFC] net/core/dst.c : Should'nt dst_run_gc() be more scalable and friendly ?
From: Eric Dumazet @ 2007-08-16 13:36 UTC (permalink / raw)
  To: Herbert Xu; +Cc: netdev
In-Reply-To: <E1ILefm-00010m-00@gondolin.me.apana.org.au>

On Thu, 16 Aug 2007 20:41:58 +0800
Herbert Xu <herbert@gondor.apana.org.au> wrote:

> Eric Dumazet <dada1@cosmosbay.com> wrote:
> >
> > I am wondering how to really solve the problem. Could a workqueue be used here instead of a timer ?
> 
> I think so.
> 
> A mutex would separate the GC and dst_ifdown.  You'd take the
> spin lock in the GC only to replace the garbage list with NULL.
> You then drop the lock to process it.  Once it's done you take
> the lock again and join it with whatever that's been added in
> the mean time.  This is easy because you should already have
> the tail after the GC process.

Thanks Herbert

Yes, I already did this (with the current softirq based timer model),
 but how can dst_dev_event() do its work, since the GC is using 
a private list. (In my patch, time to GC process XXX.000 entries is about XX seconds.)

We would have to change dst_dev_event() to :
- Signal to GC it has to stop as soon as possible.
- Wait for GC be stoped (busy wait I suspect we cannot sleep in dst_dev_event() ? )
, giving us a full garbage_list.
- Process the whole list.
- Restart GC


^ permalink raw reply

* [PATCH 4/4] net: netdev_budget rearrangement
From: Stephen Hemminger @ 2007-08-16 13:25 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20070816132523.201766718@linux-foundation.org>

[-- Attachment #1: netdev_budget.patch --]
[-- Type: text/plain, Size: 867 bytes --]

Trivial patch, move netdev_budget declaration out of netdevice.h
to a new home with the other sysctl externs.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>

--- a/include/linux/netdevice.h	2007-08-16 06:38:33.000000000 -0400
+++ b/include/linux/netdevice.h	2007-08-16 09:24:08.000000000 -0400
@@ -811,8 +811,6 @@ extern int		dev_hard_start_xmit(struct s
 					    struct net_device *dev);
 extern struct net_device_stats	*dev_get_stats(struct net_device *dev);
 
-extern int		netdev_budget;
-
 /* Called by rtnetlink.c:rtnl_unlock() */
 extern void netdev_run_todo(void);
 
--- a/net/core/sysctl_net_core.c	2007-08-06 04:26:48.000000000 -0400
+++ b/net/core/sysctl_net_core.c	2007-08-16 09:24:08.000000000 -0400
@@ -13,6 +13,7 @@
 
 #ifdef CONFIG_SYSCTL
 
+extern int netdev_budget;
 extern int netdev_max_backlog;
 extern int weight_p;
 

-- 


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox