Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 24/24] document volatile atomic_read() behavior
From: Chris Friesen @ 2007-08-09 20:10 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Chris Snook, wjiang, wensong, heiko.carstens, linux-kernel, ak,
	netdev, horms, akpm, linux-arch, torvalds, schwidefsky, davem,
	zlynx, rpjday, jesper.juhl
In-Reply-To: <0a08872e608cf5f7a3d9c0fc746a1051@kernel.crashing.org>

Segher Boessenkool wrote:

> Anyway, what's the supposed advantage of *(volatile *) vs. using
> a real volatile object?  That you can access that same object in
> a non-volatile way?

That's my understanding.  That way accesses where you don't care about 
volatility may be optimised.

For instance, in cases where there are already other things controlling 
visibility (as are needed for atomic increment, for example) you don't 
need to make the access itself volatile.

Chris

^ permalink raw reply

* Re: [patch 1/5][RFC] NET: Change pci_enable_device topci_reenable_device to keep device enable balance
From: Brandon Philips @ 2007-08-09 20:45 UTC (permalink / raw)
  To: Ramkrishna Vepa; +Cc: netdev, teheo, e1000-devel, auke-jan.h.kok
In-Reply-To: <78C9135A3D2ECE4B8162EBDCE82CAD7702035646@nekter>

On 17:30 Wed 08 Aug 2007, Ramkrishna Vepa wrote:
> Before slot_reset event is called io_error_detected could be called
> (where pci_disable_device() is called), right? 

Oops!  Right, the documentation says .error_detected is _always_ called
before .slot_reset.  So, this patch is not correct.  Please don't merge
this.

>From Documentation/pci-error-recovery.txt:

STEP 1: Notification
--------------------
Platform calls the error_detected() callback on every instance of
every driver affected by the error.
...

If any driver requested a slot reset (by returning PCI_ERS_RESULT_NEED_RESET),
then recovery proceeds to STEP 4 (Slot Reset).

> The pci_reenable_device() will call enable only if the device was
> enabled before and would not be enabled if the device were disabled. Is
> this the intended behavior?

Yes, you are right.  And no it isn't.

Thanks,

	Brandon

> > -----Original Message-----
> > From: netdev-owner@vger.kernel.org
> [mailto:netdev-owner@vger.kernel.org]
> > On Behalf Of Brandon Philips
> > Sent: Thursday, August 02, 2007 3:44 PM
> > To: netdev@vger.kernel.org
> > Cc: teheo@suse.de; Brandon Philips
> > Subject: [patch 1/5][RFC] NET: Change pci_enable_device
> > topci_reenable_device to keep device enable balance
> > 
> > On a slot_reset event pci_disable_device() is never called so calling
> > pci_enable_device() will unbalance the enable count.
> > 
> > Signed-off-by: Brandon Philips <bphilips@suse.de>
> > 
> > ---
> >  drivers/net/e100.c             |    2 +-
> >  drivers/net/e1000/e1000_main.c |    2 +-
> >  drivers/net/ixgb/ixgb_main.c   |    2 +-
> >  drivers/net/s2io.c             |    2 +-
> >  4 files changed, 4 insertions(+), 4 deletions(-)
> > 
> > Index: linux-2.6/drivers/net/e100.c
> > ===================================================================
> > --- linux-2.6.orig/drivers/net/e100.c
> > +++ linux-2.6/drivers/net/e100.c
> > @@ -2828,7 +2828,7 @@ static pci_ers_result_t e100_io_slot_res
> >  	struct net_device *netdev = pci_get_drvdata(pdev);
> >  	struct nic *nic = netdev_priv(netdev);
> > 
> > -	if (pci_enable_device(pdev)) {
> > +	if (pci_reenable_device(pdev)) {
> >  		printk(KERN_ERR "e100: Cannot re-enable PCI device after
> > reset.\n");
> >  		return PCI_ERS_RESULT_DISCONNECT;
> >  	}
> > Index: linux-2.6/drivers/net/e1000/e1000_main.c
> > ===================================================================
> > --- linux-2.6.orig/drivers/net/e1000/e1000_main.c
> > +++ linux-2.6/drivers/net/e1000/e1000_main.c
> > @@ -5270,7 +5270,7 @@ static pci_ers_result_t e1000_io_slot_re
> >  	struct net_device *netdev = pci_get_drvdata(pdev);
> >  	struct e1000_adapter *adapter = netdev->priv;
> > 
> > -	if (pci_enable_device(pdev)) {
> > +	if (pci_reenable_device(pdev)) {
> >  		printk(KERN_ERR "e1000: Cannot re-enable PCI device
> after
> > reset.\n");
> >  		return PCI_ERS_RESULT_DISCONNECT;
> >  	}
> > Index: linux-2.6/drivers/net/ixgb/ixgb_main.c
> > ===================================================================
> > --- linux-2.6.orig/drivers/net/ixgb/ixgb_main.c
> > +++ linux-2.6/drivers/net/ixgb/ixgb_main.c
> > @@ -2294,7 +2294,7 @@ static pci_ers_result_t ixgb_io_slot_res
> >  	struct net_device *netdev = pci_get_drvdata(pdev);
> >  	struct ixgb_adapter *adapter = netdev_priv(netdev);
> > 
> > -	if(pci_enable_device(pdev)) {
> > +	if(pci_reenable_device(pdev)) {
> >  		DPRINTK(PROBE, ERR, "Cannot re-enable PCI device after
> > reset.\n");
> >  		return PCI_ERS_RESULT_DISCONNECT;
> >  	}
> > Index: linux-2.6/drivers/net/s2io.c
> > ===================================================================
> > --- linux-2.6.orig/drivers/net/s2io.c
> > +++ linux-2.6/drivers/net/s2io.c
> > @@ -7833,7 +7833,7 @@ static pci_ers_result_t s2io_io_slot_res
> >  	struct net_device *netdev = pci_get_drvdata(pdev);
> >  	struct s2io_nic *sp = netdev->priv;
> > 
> > -	if (pci_enable_device(pdev)) {
> > +	if (pci_reenable_device(pdev)) {
> >  		printk(KERN_ERR "s2io: "
> >  		       "Cannot re-enable PCI device after reset.\n");
> >  		return PCI_ERS_RESULT_DISCONNECT;
> > 
> > --
> > -
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [patch 1/5][RFC] NET: Change pci_enable_device topci_reenable_device to keep device enable balance
From: Kok, Auke @ 2007-08-09 20:49 UTC (permalink / raw)
  To: Brandon Philips; +Cc: Ramkrishna Vepa, netdev, teheo, e1000-devel
In-Reply-To: <20070809204552.GA6870@ifup.org>

Brandon Philips wrote:
> On 17:30 Wed 08 Aug 2007, Ramkrishna Vepa wrote:
>> Before slot_reset event is called io_error_detected could be called
>> (where pci_disable_device() is called), right? 
> 
> Oops!  Right, the documentation says .error_detected is _always_ called
> before .slot_reset.  So, this patch is not correct.  Please don't merge
> this.
> 
> From Documentation/pci-error-recovery.txt:
> 
> STEP 1: Notification
> --------------------
> Platform calls the error_detected() callback on every instance of
> every driver affected by the error.
> ...
> 
> If any driver requested a slot reset (by returning PCI_ERS_RESULT_NEED_RESET),
> then recovery proceeds to STEP 4 (Slot Reset).
> 
>> The pci_reenable_device() will call enable only if the device was
>> enabled before and would not be enabled if the device were disabled. Is
>> this the intended behavior?
> 
> Yes, you are right.  And no it isn't.


it would be great if you can convince Linas Vepstas to review and ack your patch 
- he's the one who knows the error handling code the best.

Auke

^ permalink raw reply

* RE: [PATCH 9/24] make atomic_read() behave consistently on ia64
From: Luck, Tony @ 2007-08-09 21:03 UTC (permalink / raw)
  To: Chris Snook, linux-kernel, linux-arch, torvalds
  Cc: netdev, akpm, ak, heiko.carstens, davem, schwidefsky, wensong,
	horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl
In-Reply-To: <20070809135107.GA15985@shell.boston.redhat.com>

> +#define atomic_read(v)		(*(volatile __s32 *)&(v)->counter)
> +#define atomic64_read(v)	(*(volatile __s64 *)&(v)->counter)
> 
>  #define atomic_set(v,i)		(((v)->counter) = (i))
>  #define atomic64_set(v,i)	(((v)->counter) = (i))

Losing the volatile from the "set" variants definitely changes
the code generated.  Before the patch gcc would give us:

	st4.rel [r37]=r9

after
	st4 [r37]=r9

It is unclear whether anyone relies on (or even whether they should
rely on) the release semantics that are provided by the current
version of atomic.h.  But making this change would require an
audit of all the uses of atomic_set() to find an answer.

There is a more worrying difference in the generated code (this
from the ancient and venerable gcc 3.4.6 that is on my build
machine).   In rwsem_down_failed_common I see this change (after
disassembling vmlinux, I used sed to zap the low 32-bits of addresses
to make the diff manageable ... that's why the addresses all end
in xxxxxxxx):

712868,712873c712913,712921
< a0000001xxxxxxxx:                 adds r16=-1,r30
< a0000001xxxxxxxx:     [MII]       ld8.acq r33=[r32]
< a0000001xxxxxxxx:                 nop.i 0x0;;
< a0000001xxxxxxxx:                 add r42=r33,r16
< a0000001xxxxxxxx:     [MMI]       mov.m ar.ccv=r33;;
< a0000001xxxxxxxx:                 cmpxchg8.acq r34=[r32],r42,ar.ccv
---
> a0000001xxxxxxxx:                 adds r16=-1,r31
> a0000001xxxxxxxx:     [MMI]       ld4.acq r14=[r32];;
> a0000001xxxxxxxx:                 nop.m 0x0
> a0000001xxxxxxxx:                 sxt4 r34=r14
> a0000001xxxxxxxx:     [MMI]       nop.m 0x0;;
> a0000001xxxxxxxx:                 nop.m 0x0
> a0000001xxxxxxxx:                 add r15=r34,r16
> a0000001xxxxxxxx:     [MMI]       mov.m ar.ccv=r34;;
> a0000001xxxxxxxx:                 cmpxchg8.acq r42=[r32],r15,ar.ccv

This code is probably from the rwsem_atomic_update(adjustment, sem) macro
which is cpp'd to atomic64_add_return().  It looks really bad that the new
code reads a 32-bit value and sign extends it rather than reading a 64-bit
value (but I'm perplexed as to why this patch provoked this change in the
generated code).

-Tony

^ permalink raw reply

* Re: [PATCH 3/14] nes: connection manager routines
From: Steve Wise @ 2007-08-09 21:06 UTC (permalink / raw)
  To: Glenn Grundstrom; +Cc: Andi Kleen, rdreier, ewg, netdev
In-Reply-To: <5E701717F2B2ED4EA60F87C8AA57B7CC07443E9D@venom2>

Glenn Grundstrom wrote:
> This code is far from a TCP stack.  It's main purpose is to exchange
> RDMA MPA request/response messages that are required by the iWarp specs
> and therefore needed by our hardware.  All RNIC hardware vendors need
> this MPA message exchange, including those already accepted into
> kernel.org.  Do you have an alternative suggestion?
> 

For the record, the existing linux RNICS don't exactly work this way:

Ammasso handles all TCP connection setup and MPA negotiation in HW/FW. 
The connection setup interface between the RNIC and host driver are 
strictly for RDMA connections.

Chelsio has a message based interface to the HW/FW for establishing
TCP connections and exchanging data in streaming mode.  No TCP
processing is done in the driver.

Steve.

^ permalink raw reply

* [PATCH] e1000: Add device IDs of new 82571 board variants
From: Auke Kok @ 2007-08-09 21:09 UTC (permalink / raw)
  To: jeff; +Cc: netdev, bruce.w.allan, john.ronciak

This patch adds support for 2 new board variants:
- A Quad port fiber 82571 board
- A blade version of the 82571 quad copper board

Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
---

 drivers/net/e1000/e1000_ethtool.c |    2 ++
 drivers/net/e1000/e1000_hw.c      |    5 +++++
 drivers/net/e1000/e1000_hw.h      |    3 +++
 drivers/net/e1000/e1000_main.c    |    4 ++++
 4 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/drivers/net/e1000/e1000_ethtool.c b/drivers/net/e1000/e1000_ethtool.c
index c90c92e..4c3785c 100644
--- a/drivers/net/e1000/e1000_ethtool.c
+++ b/drivers/net/e1000/e1000_ethtool.c
@@ -1706,6 +1706,7 @@ static int e1000_wol_exclusion(struct e1000_adapter *adapter, struct ethtool_wol
 	case E1000_DEV_ID_82545EM_COPPER:
 	case E1000_DEV_ID_82546GB_QUAD_COPPER:
 	case E1000_DEV_ID_82546GB_PCIE:
+	case E1000_DEV_ID_82571EB_SERDES_QUAD:
 		/* these don't support WoL at all */
 		wol->supported = 0;
 		break;
@@ -1723,6 +1724,7 @@ static int e1000_wol_exclusion(struct e1000_adapter *adapter, struct ethtool_wol
 		retval = 0;
 		break;
 	case E1000_DEV_ID_82571EB_QUAD_COPPER:
+	case E1000_DEV_ID_82571EB_QUAD_FIBER:
 	case E1000_DEV_ID_82571EB_QUAD_COPPER_LOWPROFILE:
 	case E1000_DEV_ID_82546GB_QUAD_COPPER_KSP3:
 		/* quad port adapters only support WoL on port A */
diff --git a/drivers/net/e1000/e1000_hw.c b/drivers/net/e1000/e1000_hw.c
index 9be4469..ba120f7 100644
--- a/drivers/net/e1000/e1000_hw.c
+++ b/drivers/net/e1000/e1000_hw.c
@@ -384,7 +384,10 @@ e1000_set_mac_type(struct e1000_hw *hw)
 	case E1000_DEV_ID_82571EB_COPPER:
 	case E1000_DEV_ID_82571EB_FIBER:
 	case E1000_DEV_ID_82571EB_SERDES:
+	case E1000_DEV_ID_82571EB_SERDES_DUAL:
+	case E1000_DEV_ID_82571EB_SERDES_QUAD:
 	case E1000_DEV_ID_82571EB_QUAD_COPPER:
+	case E1000_DEV_ID_82571EB_QUAD_FIBER:
 	case E1000_DEV_ID_82571EB_QUAD_COPPER_LOWPROFILE:
 		hw->mac_type = e1000_82571;
 		break;
@@ -485,6 +488,8 @@ e1000_set_media_type(struct e1000_hw *hw)
     case E1000_DEV_ID_82545GM_SERDES:
     case E1000_DEV_ID_82546GB_SERDES:
     case E1000_DEV_ID_82571EB_SERDES:
+    case E1000_DEV_ID_82571EB_SERDES_DUAL:
+    case E1000_DEV_ID_82571EB_SERDES_QUAD:
     case E1000_DEV_ID_82572EI_SERDES:
     case E1000_DEV_ID_80003ES2LAN_SERDES_DPT:
         hw->media_type = e1000_media_type_internal_serdes;
diff --git a/drivers/net/e1000/e1000_hw.h b/drivers/net/e1000/e1000_hw.h
index bd000b8..fe87146 100644
--- a/drivers/net/e1000/e1000_hw.h
+++ b/drivers/net/e1000/e1000_hw.h
@@ -475,7 +475,10 @@ int32_t e1000_check_phy_reset_block(struct e1000_hw *hw);
 #define E1000_DEV_ID_82571EB_FIBER       0x105F
 #define E1000_DEV_ID_82571EB_SERDES      0x1060
 #define E1000_DEV_ID_82571EB_QUAD_COPPER 0x10A4
+#define E1000_DEV_ID_82571EB_QUAD_FIBER  0x10A5
 #define E1000_DEV_ID_82571EB_QUAD_COPPER_LOWPROFILE  0x10BC
+#define E1000_DEV_ID_82571EB_SERDES_DUAL 0x10D9
+#define E1000_DEV_ID_82571EB_SERDES_QUAD 0x10DA
 #define E1000_DEV_ID_82572EI_COPPER      0x107D
 #define E1000_DEV_ID_82572EI_FIBER       0x107E
 #define E1000_DEV_ID_82572EI_SERDES      0x107F
diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
index f48b659..4a22595 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -100,6 +100,7 @@ static struct pci_device_id e1000_pci_tbl[] = {
 	INTEL_E1000_ETHERNET_DEVICE(0x1099),
 	INTEL_E1000_ETHERNET_DEVICE(0x109A),
 	INTEL_E1000_ETHERNET_DEVICE(0x10A4),
+	INTEL_E1000_ETHERNET_DEVICE(0x10A5),
 	INTEL_E1000_ETHERNET_DEVICE(0x10B5),
 	INTEL_E1000_ETHERNET_DEVICE(0x10B9),
 	INTEL_E1000_ETHERNET_DEVICE(0x10BA),
@@ -107,6 +108,8 @@ static struct pci_device_id e1000_pci_tbl[] = {
 	INTEL_E1000_ETHERNET_DEVICE(0x10BC),
 	INTEL_E1000_ETHERNET_DEVICE(0x10C4),
 	INTEL_E1000_ETHERNET_DEVICE(0x10C5),
+	INTEL_E1000_ETHERNET_DEVICE(0x10D9),
+	INTEL_E1000_ETHERNET_DEVICE(0x10DA),
 	/* required last entry */
 	{0,}
 };
@@ -1096,6 +1099,7 @@ e1000_probe(struct pci_dev *pdev,
 		break;
 	case E1000_DEV_ID_82546GB_QUAD_COPPER_KSP3:
 	case E1000_DEV_ID_82571EB_QUAD_COPPER:
+	case E1000_DEV_ID_82571EB_QUAD_FIBER:
 	case E1000_DEV_ID_82571EB_QUAD_COPPER_LOWPROFILE:
 		/* if quad port adapter, disable WoL on all but port A */
 		if (global_quad_port_a != 0)

^ permalink raw reply related

* Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
From: Sean Hefty @ 2007-08-09 21:40 UTC (permalink / raw)
  To: Steve Wise
  Cc: netdev, Roland Dreier, David S. Miller, OpenFabrics General,
	linux-kernel
In-Reply-To: <46BB61D0.4090101@opengridcomputing.com>

Steve Wise wrote:
> Any more comments?

Does anyone have ideas on how to reserve the port space without using a 
struct socket?

- Sean

^ permalink raw reply

* Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
From: David Miller @ 2007-08-09 21:55 UTC (permalink / raw)
  To: mshefty; +Cc: netdev, rdreier, linux-kernel, general
In-Reply-To: <46BB89C0.4040303@ichips.intel.com>

From: Sean Hefty <mshefty@ichips.intel.com>
Date: Thu, 09 Aug 2007 14:40:16 -0700

> Steve Wise wrote:
> > Any more comments?
> 
> Does anyone have ideas on how to reserve the port space without using a 
> struct socket?

How about we just remove the RDMA stack altogether?  I am not at all
kidding.  If you guys can't stay in your sand box and need to cause
problems for the normal network stack, it's unacceptable.  We were
told all along the if RDMA went into the tree none of this kind of
stuff would be an issue.

These are exactly the kinds of problems for which people like myself
were dreading.  These subsystems have no buisness using the TCP port
space of the Linux software stack, absolutely none.

After TCP port reservation, what's next?  It seems an at least
bi-monthly event that the RDMA folks need to put their fingers
into something else in the normal networking stack.  No more.

I will NACK any patch that opens up sockets to eat up ports or
anything stupid like that.

^ permalink raw reply

* Re: [PATCH] [NET] ethtool: Add LRO support
From: David Miller @ 2007-08-09 22:00 UTC (permalink / raw)
  To: auke-jan.h.kok; +Cc: jeff, netdev, ossthema
In-Reply-To: <20070809164117.9907.23351.stgit@localhost.localdomain>

From: Auke Kok <auke-jan.h.kok@intel.com>
Date: Thu, 09 Aug 2007 09:41:17 -0700

> Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>

I think this is definitely how we should handle LRO
configuration instead of the ad-hoc module parameters
current LRO drivers use now.

I'll put this infrastructure into net-2.6.24, it would
be nice if the EHEA and myri10g folks submitted conversions.

Thanks.

^ permalink raw reply

* Re: [PATCH 24/24] document volatile atomic_read() behavior
From: Segher Boessenkool @ 2007-08-09 22:23 UTC (permalink / raw)
  To: Chris Friesen
  Cc: wjiang, rpjday, wensong, heiko.carstens, linux-kernel, ak, netdev,
	horms, akpm, linux-arch, jesper.juhl, torvalds, schwidefsky,
	davem, zlynx, Chris Snook
In-Reply-To: <46BB74B9.4070702@nortel.com>

>> Anyway, what's the supposed advantage of *(volatile *) vs. using
>> a real volatile object?  That you can access that same object in
>> a non-volatile way?
>
> That's my understanding.  That way accesses where you don't care about 
> volatility may be optimised.

But those accesses might be done non-atomically then (for example,
if the compiler knows it only needs to write one byte, it might not
bother writing the rest), so that's no good if you want to read the
thing atomically too.

> For instance, in cases where there are already other things 
> controlling visibility (as are needed for atomic increment, for 
> example) you don't need to make the access itself volatile.

Hrm, you mean within a lock or similar?  You'll get the same semantics
as volatile anyway there.


Segher


^ permalink raw reply

* Re: [PATCH 24/24] document volatile atomic_read() behavior
From: Segher Boessenkool @ 2007-08-09 22:34 UTC (permalink / raw)
  To: Chris Snook
  Cc: wjiang, wensong, heiko.carstens, linux-kernel, ak, cfriesen,
	netdev, horms, akpm, linux-arch, torvalds, schwidefsky, davem,
	zlynx, rpjday, jesper.juhl
In-Reply-To: <46BB7371.7060008@redhat.com>

>> Anyway, what's the supposed advantage of *(volatile *) vs. using
>> a real volatile object?  That you can access that same object in
>> a non-volatile way?
>
> You'll have to take that up with Linus and the minds behind Volatile 
> Considered Harmful, but the crux of it is that volatile objects are 
> prone to compiler bugs too, and if we have to track down a compiler 
> bug, it's a lot easier when we know exactly where the load is supposed 
> to be because we deliberately put it there, rather than letting the 
> compiler re-order everything that lacks a strict data dependency and 
> trying to figure out where in a thousand lines of assembler the 
> compiler should have put the load for the volatile object.

So, why not do the access explicitly via an inline asm?  It
generates the same code, it's obviously correct, and it's
even *actually* correct.  Plus, you get good compiler
support (and compiler people support).

> If we're going to assume that the compiler has bugs we'll never be 
> able to find, we all need to find new careers.

If we cannot find the bug in finite time, we cannot observe
the bug in finite time either, so either way that's fine :-)

> If we're going to assume that it has bugs we *can* find, then let's 
> use code that makes it easier to do that.

And I'm saying this is a step in the wrong direction for that.

> I initially proposed a patch that made all the objects volatile, on 
> the grounds that this was a special case where there wasn't much room 
> to have a misunderstanding that resulted in anything worse than wasted 
> loads.  Linus objected, and now that I've seen all the responses to 
> the new patchset, I understand exactly why.  If our compilers really 
> suck as much as everyone says they do, it'll be much easier to detect 
> that with volatile casts than with volatile declarations.

Except that accesses via volatile pointer casts open up a whole
new can of worms.


Segher

^ permalink raw reply

* Re: [PATCH 1/24] make atomic_read() behave consistently on alpha
From: Segher Boessenkool @ 2007-08-09 23:02 UTC (permalink / raw)
  To: Chris Snook
  Cc: paulmck, heiko.carstens, horms, linux-kernel, rpjday, ak, netdev,
	cfriesen, akpm, torvalds, jesper.juhl, Geert Uytterhoeven,
	linux-arch, zlynx, schwidefsky, davem, wensong, wjiang
In-Reply-To: <46BB6F65.8040204@redhat.com>

>>>> So, why not use the well-defined alternative?
>>> Because we don't need to, and it hurts performance.
>> It hurts performance by implementing 32-bit atomic reads in assembler?
>
> No, I misunderstood the question.  Implementing 32-bit atomic reads in 
> assembler is redundant, because any sane compiler, *particularly* and 
> optimizing compiler (and we're only in this mess because of optimizing 
> compilers)

Oh please, don't tell me you don't want an optimising compiler.
And if you _do_ want one, well you're in this mess because you
chose C as implementation language and C has some pretty strange
rules.  Trying to use not-all-that-well-defined-and-completely-
misunderstood features of the language doesn't make things easier;
trying to use something that isn't even part of the language and
that your particular compiler originally supported by accident,
and that isn't yet an officially supported feature, and that on
top of it all has a track record of problems -- well it makes me
wonder if you're in this game for fun or what.

>  will give us that automatically without the assembler.

No, it does *not* give it to you automatically; you have to do
either the asm() thing, or the not-defined-at-all *(volatile *)&
thing.

> Yes, it is legal for a compiler to violate this assumption.  It is 
> also legal for us to refuse to maintain compatibility with compilers 
> that suck this badly.

So that's rm include/linux/compiler-gcc*.h then.  Good luck with
the intel compiler, maybe it works more to your liking.

Segher

^ permalink raw reply

* Re: [PATCH] e1000e: Fix header includes
From: Chuck Ebbert @ 2007-08-09 23:06 UTC (permalink / raw)
  To: Auke Kok; +Cc: jeff, akpm, netdev, john.ronciak
In-Reply-To: <20070808231525.27519.93728.stgit@localhost.localdomain>

On 08/08/2007 07:15 PM, Auke Kok wrote:
> Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
> ---
> 
>  drivers/net/e1000e/82571.c   |    4 ++++
>  drivers/net/e1000e/e1000.h   |    7 ++++---
>  drivers/net/e1000e/es2lan.c  |    5 +++++
>  drivers/net/e1000e/ethtool.c |    3 ++-
>  drivers/net/e1000e/hw.h      |    2 ++
>  drivers/net/e1000e/ich8lan.c |    5 +++++
>  drivers/net/e1000e/lib.c     |    2 ++
>  drivers/net/e1000e/netdev.c  |    1 +
>  8 files changed, 25 insertions(+), 4 deletions(-)
> 

Still broken on ppc:

drivers/net/e1000e/phy.c: In function 'e1000e_copper_link_setup_igp':
drivers/net/e1000e/phy.c:512: error: implicit declaration of function 'msleep'
drivers/net/e1000e/phy.c: In function 'e1000e_phy_has_link_generic':
drivers/net/e1000e/phy.c:1339: error: implicit declaration of function 'mdelay'


^ permalink raw reply

* Re: [PATCH] e1000e: Fix header includes
From: Kok, Auke @ 2007-08-09 23:15 UTC (permalink / raw)
  To: Chuck Ebbert, jeff; +Cc: akpm, netdev, john.ronciak
In-Reply-To: <46BB9DEA.9060406@redhat.com>

Chuck Ebbert wrote:
> On 08/08/2007 07:15 PM, Auke Kok wrote:
>> Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
>> ---
>>
>>  drivers/net/e1000e/82571.c   |    4 ++++
>>  drivers/net/e1000e/e1000.h   |    7 ++++---
>>  drivers/net/e1000e/es2lan.c  |    5 +++++
>>  drivers/net/e1000e/ethtool.c |    3 ++-
>>  drivers/net/e1000e/hw.h      |    2 ++
>>  drivers/net/e1000e/ich8lan.c |    5 +++++
>>  drivers/net/e1000e/lib.c     |    2 ++
>>  drivers/net/e1000e/netdev.c  |    1 +
>>  8 files changed, 25 insertions(+), 4 deletions(-)
>>
> 
> Still broken on ppc:
> 
> drivers/net/e1000e/phy.c: In function 'e1000e_copper_link_setup_igp':
> drivers/net/e1000e/phy.c:512: error: implicit declaration of function 'msleep'
> drivers/net/e1000e/phy.c: In function 'e1000e_phy_has_link_generic':
> drivers/net/e1000e/phy.c:1339: error: implicit declaration of function 'mdelay'


ok, time to install git manually on my 4-year-old yellowdog ppc install :@

I'll resend a new patch to Jeff once I have this sorted out.


Auke

^ permalink raw reply

* Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
From: Sean Hefty @ 2007-08-09 23:22 UTC (permalink / raw)
  To: David Miller; +Cc: swise, rdreier, netdev, linux-kernel, general
In-Reply-To: <20070809.145534.102938208.davem@davemloft.net>

> How about we just remove the RDMA stack altogether?  I am not at all
> kidding.  If you guys can't stay in your sand box and need to cause
> problems for the normal network stack, it's unacceptable.  We were
> told all along the if RDMA went into the tree none of this kind of
> stuff would be an issue.

There are currently two RDMA solutions available.  Each solution has 
different requirements and uses the normal network stack differently. 
Infiniband uses its own transport.  iWarp runs over TCP.

We have tried to leverage the existing infrastructure where it makes sense.

> After TCP port reservation, what's next?  It seems an at least
> bi-monthly event that the RDMA folks need to put their fingers
> into something else in the normal networking stack.  No more.

Currently, the RDMA stack uses its own port space.  This causes a 
problem for iWarp, and is what Steve is looking for a solution for.  I'm 
not an iWarp guru, so I don't know what options exist.  Can iWarp use 
its own address family?  Identify specific IP addresses for iWarp use? 
Restrict iWarp to specific port numbers?  Let the app control the 
correct operation?  I don't know.

Steve merely defined a problem and suggested a possible solution.  He's 
looking for constructive help trying to solve the problem.

- Sean

^ permalink raw reply

* Re: 2.6.23-rc2-mm1: rtl8139 inconsistent lock state
From: Mariusz Kozlowski @ 2007-08-09 23:49 UTC (permalink / raw)
  To: Andrew Morton, netdev, Jeff Garzik; +Cc: linux-kernel
In-Reply-To: <20070809015106.cd0bfc53.akpm@linux-foundation.org>

Hello,

=================================
[ INFO: inconsistent lock state ]
2.6.23-rc2-mm1 #7
---------------------------------
inconsistent {in-hardirq-W} -> {hardirq-on-W} usage.
ifconfig/5492 [HC0[0]:SC0[0]:HE1:SE1] takes:
 (&tp->lock){+...}, at: [<de8706e0>] rtl8139_interrupt+0x27/0x46b [8139too]
{in-hardirq-W} state was registered at:
  [<c0138eeb>] __lock_acquire+0x949/0x11ac
  [<c01397e7>] lock_acquire+0x99/0xb2
  [<c0452ff3>] _spin_lock+0x35/0x42
  [<de8706e0>] rtl8139_interrupt+0x27/0x46b [8139too]
  [<c0147a5d>] handle_IRQ_event+0x28/0x59
  [<c01493ca>] handle_level_irq+0xad/0x10b
  [<c0105a13>] do_IRQ+0x93/0xd0
  [<c010441e>] common_interrupt+0x2e/0x34
  [<c0370d2b>] cpuidle_idle_call+0x74/0x99
  [<c01023e7>] cpu_idle+0x87/0x89
  [<c044fa24>] rest_init+0x60/0x62
  [<c05f8ad5>] start_kernel+0x23a/0x2c5
  [<00000000>] 0x0
  [<ffffffff>] 0xffffffff
irq event stamp: 1777
hardirqs last  enabled at (1777): [<c0169c04>] kfree+0xee/0x105
hardirqs last disabled at (1776): [<c0169b9d>] kfree+0x87/0x105
softirqs last  enabled at (1756): [<c03cb7ab>] dev_deactivate+0x86/0xa5
softirqs last disabled at (1754): [<c045300e>] _spin_lock_bh+0xe/0x47

other info that might help us debug this:
1 lock held by ifconfig/5492:
 #0:  (rtnl_mutex){--..}, at: [<c0451778>] mutex_lock+0x1c/0x1f

stack backtrace:
 [<c0104869>] show_trace_log_lvl+0x1a/0x30
 [<c01053ad>] show_trace+0x12/0x14
 [<c0105515>] dump_stack+0x15/0x17
 [<c0136ed9>] print_usage_bug+0x145/0x14f
 [<c0137ce7>] mark_lock+0x61f/0x70c
 [<c0138ce0>] __lock_acquire+0x73e/0x11ac
 [<c01397e7>] lock_acquire+0x99/0xb2
 [<c0452ff3>] _spin_lock+0x35/0x42
 [<de8706e0>] rtl8139_interrupt+0x27/0x46b [8139too]
 [<c01480fd>] free_irq+0x11b/0x146
 [<de871d59>] rtl8139_close+0x8a/0x14a [8139too]
 [<c03bde63>] dev_close+0x57/0x74
 [<c03bce65>] dev_change_flags+0x8e/0x190
 [<c03fd37c>] devinet_ioctl+0x4af/0x652
 [<c03fdc62>] inet_ioctl+0x56/0x71
 [<c03b1dba>] sock_ioctl+0xa5/0x1d4
 [<c0178b42>] do_ioctl+0x22/0x71
 [<c0178be6>] vfs_ioctl+0x55/0x29e
 [<c0178e62>] sys_ioctl+0x33/0x69
 [<c01041aa>] sysenter_past_esp+0x5f/0x99
 =======================

Regards,

	Mariusz

^ permalink raw reply

* [PATCH 04/10] sysctl: Fix neighbour table sysctls.
From: Eric W. Biederman @ 2007-08-10  0:56 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, devel, Alexey Dobriyan, netdev, David Miller
In-Reply-To: <m14pj89pmt.fsf_-_@ebiederm.dsl.xmission.com>


- In ipv6 ndisc_ifinfo_syctl_change so it doesn't depend on binary
  sysctl names for a function that works with proc.

- In neighbour.c reorder the table to put the possibly unused entries
  at the end so we can remove them by terminating the table early.

- In neighbour.c kill the entries with questionable binary sysctl
  handling behavior.

- In neighbour.c if we don't have a strategy routine remove the
  binary path.  So we don't the default sysctl strategy routine
  on data that is not ready for it.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 net/core/neighbour.c |   75 ++++++++++++++++++++++++++------------------------
 net/ipv6/ndisc.c     |   24 ++++++---------
 2 files changed, 49 insertions(+), 50 deletions(-)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index ca2a153..27c3f4e 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -2498,7 +2498,6 @@ static struct neigh_sysctl_table {
 			.proc_handler	= &proc_dointvec,
 		},
 		{
-			.ctl_name	= NET_NEIGH_RETRANS_TIME,
 			.procname	= "retrans_time",
 			.maxlen		= sizeof(int),
 			.mode		= 0644,
@@ -2543,27 +2542,40 @@ static struct neigh_sysctl_table {
 			.proc_handler	= &proc_dointvec,
 		},
 		{
-			.ctl_name	= NET_NEIGH_ANYCAST_DELAY,
 			.procname	= "anycast_delay",
 			.maxlen		= sizeof(int),
 			.mode		= 0644,
 			.proc_handler	= &proc_dointvec_userhz_jiffies,
 		},
 		{
-			.ctl_name	= NET_NEIGH_PROXY_DELAY,
 			.procname	= "proxy_delay",
 			.maxlen		= sizeof(int),
 			.mode		= 0644,
 			.proc_handler	= &proc_dointvec_userhz_jiffies,
 		},
 		{
-			.ctl_name	= NET_NEIGH_LOCKTIME,
 			.procname	= "locktime",
 			.maxlen		= sizeof(int),
 			.mode		= 0644,
 			.proc_handler	= &proc_dointvec_userhz_jiffies,
 		},
 		{
+			.ctl_name	= NET_NEIGH_RETRANS_TIME_MS,
+			.procname	= "retrans_time_ms",
+			.maxlen		= sizeof(int),
+			.mode		= 0644,
+			.proc_handler	= &proc_dointvec_ms_jiffies,
+			.strategy	= &sysctl_ms_jiffies,
+		},
+		{
+			.ctl_name	= NET_NEIGH_REACHABLE_TIME_MS,
+			.procname	= "base_reachable_time_ms",
+			.maxlen		= sizeof(int),
+			.mode		= 0644,
+			.proc_handler	= &proc_dointvec_ms_jiffies,
+			.strategy	= &sysctl_ms_jiffies,
+		},
+		{
 			.ctl_name	= NET_NEIGH_GC_INTERVAL,
 			.procname	= "gc_interval",
 			.maxlen		= sizeof(int),
@@ -2592,22 +2604,7 @@ static struct neigh_sysctl_table {
 			.mode		= 0644,
 			.proc_handler	= &proc_dointvec,
 		},
-		{
-			.ctl_name	= NET_NEIGH_RETRANS_TIME_MS,
-			.procname	= "retrans_time_ms",
-			.maxlen		= sizeof(int),
-			.mode		= 0644,
-			.proc_handler	= &proc_dointvec_ms_jiffies,
-			.strategy	= &sysctl_ms_jiffies,
-		},
-		{
-			.ctl_name	= NET_NEIGH_REACHABLE_TIME_MS,
-			.procname	= "base_reachable_time_ms",
-			.maxlen		= sizeof(int),
-			.mode		= 0644,
-			.proc_handler	= &proc_dointvec_ms_jiffies,
-			.strategy	= &sysctl_ms_jiffies,
-		},
+		{}
 	},
 	.neigh_dev = {
 		{
@@ -2660,42 +2657,48 @@ int neigh_sysctl_register(struct net_device *dev, struct neigh_parms *p,
 	t->neigh_vars[9].data  = &p->anycast_delay;
 	t->neigh_vars[10].data = &p->proxy_delay;
 	t->neigh_vars[11].data = &p->locktime;
+	t->neigh_vars[12].data  = &p->retrans_time;
+	t->neigh_vars[13].data  = &p->base_reachable_time;
 
 	if (dev) {
 		dev_name_source = dev->name;
 		t->neigh_dev[0].ctl_name = dev->ifindex;
-		t->neigh_vars[12].procname = NULL;
-		t->neigh_vars[13].procname = NULL;
-		t->neigh_vars[14].procname = NULL;
-		t->neigh_vars[15].procname = NULL;
+		/* Terminate the table early */
+		memset(&t->neigh_vars[14], 0, sizeof(t->neigh_vars[14]));
 	} else {
 		dev_name_source = t->neigh_dev[0].procname;
-		t->neigh_vars[12].data = (int *)(p + 1);
-		t->neigh_vars[13].data = (int *)(p + 1) + 1;
-		t->neigh_vars[14].data = (int *)(p + 1) + 2;
-		t->neigh_vars[15].data = (int *)(p + 1) + 3;
+		t->neigh_vars[14].data = (int *)(p + 1);
+		t->neigh_vars[15].data = (int *)(p + 1) + 1;
+		t->neigh_vars[16].data = (int *)(p + 1) + 2;
+		t->neigh_vars[17].data = (int *)(p + 1) + 3;
 	}
 
-	t->neigh_vars[16].data  = &p->retrans_time;
-	t->neigh_vars[17].data  = &p->base_reachable_time;
 
 	if (handler || strategy) {
 		/* RetransTime */
 		t->neigh_vars[3].proc_handler = handler;
 		t->neigh_vars[3].strategy = strategy;
 		t->neigh_vars[3].extra1 = dev;
+		if (!strategy)
+			t->neigh_vars[3].ctl_name = CTL_UNNUMBERED;
 		/* ReachableTime */
 		t->neigh_vars[4].proc_handler = handler;
 		t->neigh_vars[4].strategy = strategy;
 		t->neigh_vars[4].extra1 = dev;
+		if (!strategy)
+			t->neigh_vars[4].ctl_name = CTL_UNNUMBERED;
 		/* RetransTime (in milliseconds)*/
-		t->neigh_vars[16].proc_handler = handler;
-		t->neigh_vars[16].strategy = strategy;
-		t->neigh_vars[16].extra1 = dev;
+		t->neigh_vars[12].proc_handler = handler;
+		t->neigh_vars[12].strategy = strategy;
+		t->neigh_vars[12].extra1 = dev;
+		if (!strategy)
+			t->neigh_vars[12].ctl_name = CTL_UNNUMBERED;
 		/* ReachableTime (in milliseconds) */
-		t->neigh_vars[17].proc_handler = handler;
-		t->neigh_vars[17].strategy = strategy;
-		t->neigh_vars[17].extra1 = dev;
+		t->neigh_vars[13].proc_handler = handler;
+		t->neigh_vars[13].strategy = strategy;
+		t->neigh_vars[13].extra1 = dev;
+		if (!strategy)
+			t->neigh_vars[13].ctl_name = CTL_UNNUMBERED;
 	}
 
 	dev_name = kstrdup(dev_name_source, GFP_KERNEL);
diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index 0358e60..d388429 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -1570,30 +1570,26 @@ int ndisc_ifinfo_sysctl_change(struct ctl_table *ctl, int write, struct file * f
 	struct inet6_dev *idev;
 	int ret;
 
-	if (ctl->ctl_name == NET_NEIGH_RETRANS_TIME ||
-	    ctl->ctl_name == NET_NEIGH_REACHABLE_TIME)
+	if ((strcmp(ctl->procname, "retrans_time") == 0) ||
+	    (strcmp(ctl->procname, "base_reachable_time") == 0))
 		ndisc_warn_deprecated_sysctl(ctl, "syscall", dev ? dev->name : "default");
 
-	switch (ctl->ctl_name) {
-	case NET_NEIGH_RETRANS_TIME:
+	if (strcmp(ctl->procname, "retrans_time") == 0)
 		ret = proc_dointvec(ctl, write, filp, buffer, lenp, ppos);
-		break;
-	case NET_NEIGH_REACHABLE_TIME:
+
+	else if (strcmp(ctl->procname, "base_reachable_time") == 0)
 		ret = proc_dointvec_jiffies(ctl, write,
 					    filp, buffer, lenp, ppos);
-		break;
-	case NET_NEIGH_RETRANS_TIME_MS:
-	case NET_NEIGH_REACHABLE_TIME_MS:
+
+	else if ((strcmp(ctl->procname, "retrans_time_ms") == 0) ||
+		 (strcmp(ctl->procname, "base_reacable_time_ms") == 0))
 		ret = proc_dointvec_ms_jiffies(ctl, write,
 					       filp, buffer, lenp, ppos);
-		break;
-	default:
+	else
 		ret = -1;
-	}
 
 	if (write && ret == 0 && dev && (idev = in6_dev_get(dev)) != NULL) {
-		if (ctl->ctl_name == NET_NEIGH_REACHABLE_TIME ||
-		    ctl->ctl_name == NET_NEIGH_REACHABLE_TIME_MS)
+		if (ctl->data == &idev->nd_parms->base_reachable_time)
 			idev->nd_parms->reachable_time = neigh_rand_reach_time(idev->nd_parms->base_reachable_time);
 		idev->tstamp = jiffies;
 		inet6_ifinfo_notify(RTM_NEWLINK, idev);
-- 
1.5.1.1.181.g2de0


^ permalink raw reply related

* [PATCH 05/10] sysctl: ipv6 route flushing (kill binary path)
From: Eric W. Biederman @ 2007-08-10  0:57 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Alexey Dobriyan, netdev, David Miller
In-Reply-To: <m1zm108axi.fsf_-_@ebiederm.dsl.xmission.com>

We don't preoperly support the sysctl binary path for flushing
the ipv6 routes.  So remove support for a binary path.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 net/ipv6/route.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 55ea80f..0d23a46 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2458,7 +2458,6 @@ int ipv6_sysctl_rtcache_flush(ctl_table *ctl, int write, struct file * filp,

 ctl_table ipv6_route_table[] = {
 	{
-		.ctl_name	=	NET_IPV6_ROUTE_FLUSH,
 		.procname	=	"flush",
 		.data		=	&flush_delay,
 		.maxlen		=	sizeof(int),
-- 
1.5.1.1.181.g2de0

^ permalink raw reply related

* [PATCH 06/10] sysctl: Remove broken sunrpc debug binary sysctls
From: Eric W. Biederman @ 2007-08-10  0:58 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Alexey Dobriyan, netdev, David Miller,
	trond.myklebust
In-Reply-To: <m1vebo8avf.fsf_-_@ebiederm.dsl.xmission.com>

>From ddf280c9de903f1fb5d4ecf9c68df0c479d7c7d2 Mon Sep 17 00:00:00 2001
From: Eric W. Biederman <ebiederm@xmission.com>
Date: Thu, 9 Aug 2007 16:00:00 -0600
Subject: 

This is debug code so no need to support binary sysctl,
and the binary sysctls as they were written were not
consistent with what showed up in /proc so remove
the binary sysctl support.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 net/sunrpc/sysctl.c |    4 ----
 1 files changed, 0 insertions(+), 4 deletions(-)

diff --git a/net/sunrpc/sysctl.c b/net/sunrpc/sysctl.c
index 738db32..864b541 100644
--- a/net/sunrpc/sysctl.c
+++ b/net/sunrpc/sysctl.c
@@ -114,7 +114,6 @@ done:
 
 static ctl_table debug_table[] = {
 	{
-		.ctl_name	= CTL_RPCDEBUG,
 		.procname	= "rpc_debug",
 		.data		= &rpc_debug,
 		.maxlen		= sizeof(int),
@@ -122,7 +121,6 @@ static ctl_table debug_table[] = {
 		.proc_handler	= &proc_dodebug
 	},
 	{
-		.ctl_name	= CTL_NFSDEBUG,
 		.procname	= "nfs_debug",
 		.data		= &nfs_debug,
 		.maxlen		= sizeof(int),
@@ -130,7 +128,6 @@ static ctl_table debug_table[] = {
 		.proc_handler	= &proc_dodebug
 	},
 	{
-		.ctl_name	= CTL_NFSDDEBUG,
 		.procname	= "nfsd_debug",
 		.data		= &nfsd_debug,
 		.maxlen		= sizeof(int),
@@ -138,7 +135,6 @@ static ctl_table debug_table[] = {
 		.proc_handler	= &proc_dodebug
 	},
 	{
-		.ctl_name	= CTL_NLMDEBUG,
 		.procname	= "nlm_debug",
 		.data		= &nlm_debug,
 		.maxlen		= sizeof(int),
-- 
1.5.1.1.181.g2de0


^ permalink raw reply related

* [PATCH 09/10] sysctl: ipv4 remove binary sysctl paths where they are broken.
From: Eric W. Biederman @ 2007-08-10  1:02 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Alexey Dobriyan, David Miller, netdev
In-Reply-To: <m1ir7o8ap1.fsf_-_@ebiederm.dsl.xmission.com>

Currently tcp_available_congestion_control does not even
attempt being read from sys_sysctl, and ipfrag_max_dist
while it works allows setting of invalid values using
sys_sysctl.

So just kill the binary sys_sysctl support for these
sysctls.  If the support is not important enough to
test and get right it probably isn't important enough
to keep.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 net/ipv4/sysctl_net_ipv4.c |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 53ef0f4..282eb7e 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -672,7 +672,6 @@ ctl_table ipv4_table[] = {
 		.strategy	= &sysctl_jiffies
 	},
 	{
-		.ctl_name	= NET_IPV4_IPFRAG_MAX_DIST,
 		.procname	= "ipfrag_max_dist",
 		.data		= &sysctl_ipfrag_max_dist,
 		.maxlen		= sizeof(int),
@@ -797,7 +796,6 @@ ctl_table ipv4_table[] = {
 	},
 #endif /* CONFIG_NETLABEL */
 	{
-		.ctl_name	= NET_TCP_AVAIL_CONG_CONTROL,
 		.procname	= "tcp_available_congestion_control",
 		.maxlen		= TCP_CA_BUF_MAX,
 		.mode		= 0444,
-- 
1.5.1.1.181.g2de0

^ permalink raw reply related

* [PATCH 06/10] sysctl: Remove broken sunrpc debug binary sysctls
From: Eric W. Biederman @ 2007-08-10  1:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Alexey Dobriyan, netdev, David Miller,
	trond.myklebust
In-Reply-To: <m1vebo8avf.fsf_-_@ebiederm.dsl.xmission.com>


This is debug code so no need to support binary sysctl,
and the binary sysctls as they were written were not
consistent with what showed up in /proc so remove
the binary sysctl support.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 net/sunrpc/sysctl.c |    4 ----
 1 files changed, 0 insertions(+), 4 deletions(-)

diff --git a/net/sunrpc/sysctl.c b/net/sunrpc/sysctl.c
index 738db32..864b541 100644
--- a/net/sunrpc/sysctl.c
+++ b/net/sunrpc/sysctl.c
@@ -114,7 +114,6 @@ done:
 
 static ctl_table debug_table[] = {
 	{
-		.ctl_name	= CTL_RPCDEBUG,
 		.procname	= "rpc_debug",
 		.data		= &rpc_debug,
 		.maxlen		= sizeof(int),
@@ -122,7 +121,6 @@ static ctl_table debug_table[] = {
 		.proc_handler	= &proc_dodebug
 	},
 	{
-		.ctl_name	= CTL_NFSDEBUG,
 		.procname	= "nfs_debug",
 		.data		= &nfs_debug,
 		.maxlen		= sizeof(int),
@@ -130,7 +128,6 @@ static ctl_table debug_table[] = {
 		.proc_handler	= &proc_dodebug
 	},
 	{
-		.ctl_name	= CTL_NFSDDEBUG,
 		.procname	= "nfsd_debug",
 		.data		= &nfsd_debug,
 		.maxlen		= sizeof(int),
@@ -138,7 +135,6 @@ static ctl_table debug_table[] = {
 		.proc_handler	= &proc_dodebug
 	},
 	{
-		.ctl_name	= CTL_NLMDEBUG,
 		.procname	= "nlm_debug",
 		.data		= &nlm_debug,
 		.maxlen		= sizeof(int),
-- 
1.5.1.1.181.g2de0


^ permalink raw reply related

* Re: [PATCH 1/1] NFS: change the ip_map cache code to handle IPv6 addresses
From: Neil Brown @ 2007-08-10  1:06 UTC (permalink / raw)
  To: Aurélien Charbon; +Cc: netdev ML, Mailing list NFSv4
In-Reply-To: <46BAC0B9.1020207@ext.bull.net>

On Thursday August 9, aurelien.charbon@ext.bull.net wrote:
> Here is a small part of missing pieces of IPv6 support for the server.
> It deals with the ip_map caching code part.
> 
> It changes the ip_map structure to be able to store INET6 addresses.
> It adds also the changes in address hashing, and mapping to test it with 
> INET addresses.

Thanks - this generally seems to make sense.  A few specific comments:

> @@ -1558,6 +1558,7 @@

Please use "diff -p" to include the C function name.  It makes the
patch easier to read.


> +    for (i = 0; i < ncp->cl_naddr; i++) {
> +        /* Mapping address */
> +            addr6.s6_addr32[0] = 0;
> +            addr6.s6_addr32[1] = 0;
> +            addr6.s6_addr32[2] = htonl(0xffff);
> +            addr6.s6_addr32[3] = (uint32_t)ncp->cl_addrlist[i].s_addr;
> +        auth_unix_add_addr(addr6, dom);
> +    }

You do this conversion several times, and each time it is indented
strangely...  Maybe create an inline (if there isn't one already) to
do this.

>  extern void auth_domain_put(struct auth_domain *item);
> -extern int auth_unix_add_addr(struct in_addr addr, struct auth_domain 
> *dom);
> +extern int auth_unix_add_addr(struct in6_addr addr, struct auth_domain 
> *dom);

Your mailer is wrapping long lines.  This is bad.

>  static int ip_map_match(struct cache_head *corig, struct cache_head *cnew)
>  {
>      struct ip_map *orig = container_of(corig, struct ip_map, h);
>      struct ip_map *new = container_of(cnew, struct ip_map, h);
>      return strcmp(orig->m_class, new->m_class) == 0
> -        && orig->m_addr.s_addr == new->m_addr.s_addr;
> +        && memcmp(orig->m_addr.s6_addr, 
> new->m_addr.s6_addr,sizeof(struct in6_addr));
>  }

I think you have inverted the sense of the test.  The original tests
for equality.  The new tests for inequality.  For memcmp (and strcmp),
always use   "function(arg1, argc) COMPARE_OP 0"
e.g.
            memcmp(orig, new, size) == 0
or use ipv6_addr_equal.


Also please run ./scripts/checkpatch.pl on the patch.  You need a
space after that comma.

   
> @@ -125,7 +129,7 @@
>      struct ip_map *item = container_of(citem, struct ip_map, h);
>  
>      strcpy(new->m_class, item->m_class);
> -    new->m_addr.s_addr = item->m_addr.s_addr;
> +    memcpy(&new->m_addr.s6_addr, 
> &item->m_addr.s6_addr,sizeof(item->m_addr.s6_addr));

Ditto..... did you test this code?


>  }
>  static void update(struct cache_head *cnew, struct cache_head *citem)
>  {
> @@ -151,20 +155,28 @@
>  {
>      char text_addr[20];
>      struct ip_map *im = container_of(h, struct ip_map, h);
> -    __be32 addr = im->m_addr.s_addr;
> +   
> +    __be32 addr[4];
> +    int i;
> +    for (i=0;i<4;i++)
> +        addr[i] = im->m_addr.s6_addr[i];
                                   ^^^^

I think you mean "s6_addr32" ??


>  
> -    snprintf(text_addr, 20, "%u.%u.%u.%u",
> -         ntohl(addr) >> 24 & 0xff,
> -         ntohl(addr) >> 16 & 0xff,
> -         ntohl(addr) >>  8 & 0xff,
> -         ntohl(addr) >>  0 & 0xff);
> +    snprintf(text_addr, 20, "%04x:%04x:%04x:%04x:%04x:%04x:%04x:%04x",
> +            ntohl(addr[3]) >> 16 & 0xff,
                                      ^^^^
should be 0xffff ???

Now I know you didn't test the code :-)

> +            ntohl(addr[3]) >>  0 & 0xff,
> +            ntohl(addr[2]) >> 16 & 0xff,
> +            ntohl(addr[2]) >>  0 & 0xff,
> +            ntohl(addr[1]) >> 16 & 0xff,
> +            ntohl(addr[1]) >>  0 & 0xff,
> +            ntohl(addr[0]) >> 16 & 0xff,
> +            ntohl(addr[0]) >>  0 & 0xff);

This code would read better if you did
   __be16 addr[8];
   for (i = 0; i < 8 ; i++)
	addr[i] = im->m_addr.s6_addr16[i];
   snprintf(........
		ntohs(addr[7]),
		ntohs(addr[6]),
		....

Also, I think that if it is a (converted) IPv4 address, it should be
printed as a dotted-quad.


> +               addr.s6_addr32[0] = htonl((b1<<16)|b2);
> +               addr.s6_addr32[1] = htonl((b3<<16)|b4);
> +               addr.s6_addr32[2] = htonl((b5<<16)|b6);
> +               addr.s6_addr32[3] = htonl((b7<<16)|b8);

Assign to addr.s6_addr16[].

> -    seq_printf(m, "%s %d.%d.%d.%d %s\n",
> +    seq_printf(m, "%s %04x.%04x.%04x.%04x.%04x.%04x.%04x.%04x %s\n",
>             im->m_class,
> -           ntohl(addr.s_addr) >> 24 & 0xff,
> -           ntohl(addr.s_addr) >> 16 & 0xff,
> -           ntohl(addr.s_addr) >>  8 & 0xff,
> -           ntohl(addr.s_addr) >>  0 & 0xff,
> +           ntohl(addr.s6_addr32[3]) >> 16 & 0xffff,
> +           ntohl(addr.s6_addr32[3]) & 0xffff,
> +           ntohl(addr.s6_addr32[2]) >> 16 & 0xffff,
> +           ntohl(addr.s6_addr32[2]) & 0xffff,
> +           ntohl(addr.s6_addr32[1]) >> 16 & 0xffff,
> +           ntohl(addr.s6_addr32[1]) & 0xffff,
> +           ntohl(addr.s6_addr32[0]) >> 16 & 0xffff,
> +           ntohl(addr.s6_addr32[0]) & 0xffff,

Again, s6_addr16[], and keep the dotted-quad for IPv4.
In fact ( borrowing Chuck's suggestion)
 if (is_ipv4)
	seq_printf(m, "%s "NIPQUAD_FMT" %s\n", im->m_class,
			NIPQUAD((addr.s6_addr32[0])), ....);
 else
	seq_printf(m, "%s "NIP6_FMT" %s\n", im->m_class,
			NIP6(addr), ...);



Thanks,
NeilBrown

^ permalink raw reply

* Potential u32 classifier bug.
From: Waskiewicz Jr, Peter P @ 2007-08-10  1:07 UTC (permalink / raw)
  To: netdev

I've recently been trying to use tc with u32 classifiers to filter
ethernet traffic based on ethertype.  Although we found and fixed an
issue in the sch_prio call to tc_classify() (thanks Patrick!), this
didn't fix the actual filtering issue.  For those of you who are curious
or are tc-savy, I'm really in a bind and need some help.  This is what I
have so far:

Filter to identify and move traffic to a different flow:

# tc qdisc add dev eth2 root handle 1: rr bands 8 priomap 7 7 6 6 5 5 4
4 3 3 2 2 1 1 0 0 multiqueue
# tc filter add dev eth2 parent 1: protocol 802_3 prio 1 u32 match u32
0x00000800 0x0000ffff at 12 flowid 1:6

Now this hits tc_classify(), and tp->protocol and skb->protocol match
(be16 of 8 - ETH_P_802_3, which is what I expect), so u32_classify() is
called through the tp->classify() func pointer.  This is where things
get weird.

In net/sched/cls_u32.c, u32_classify() grabs a reference to part of the
network header.  This is question number one: how can the filter look at
the ethernet (mac) header if it's grabbing a reference to the network
header:

	u8 *ptr = skb_network_header(skb);

I would think that for a protocol of 802.3, one would want:

	u8 *ptr = skb_mac_header(skb);

Changing this though doesn't fix the filter.  Further down is a rather
horrible match criteria to make sure the filter is looking at the right
data before it applies the whole filter list on the skb:

	if
((*(u32*)(ptr+key->off+(off2&key->offmask))^key->val)&key->mask) {

Now if this matches (meaning it evaluates to zero), we can move on.  If
not, we go to the next key node and try again.  Run out of key nodes, we
return -1 and take the default mapping from IP TOS to Linux priority,
and get queued to a band.

My big question is: Has anyone recently used the 802_3 protocol in tc
with u32 and actually gotten it to work?  I can't see how the
u32_classify() code can look at the mac header, since it is using the
network header accessor to start looking.  I think this is an issue with
the classification code, but I'm looking to see if I'm doing something
stupid before I really start digging into this mess.

Thanks in advance,

PJ Waskiewicz
Intel Corporation
peter.p.waskiewicz.jr@intel.com <mailto:peter.p.waskiewicz.jr@intel.com>

^ permalink raw reply

* Re: [PATCH 1/1] NFS: change the ip_map cache code to handle IPv6 addresses
From: Neil Brown @ 2007-08-10  1:11 UTC (permalink / raw)
  To: chuck.lever; +Cc: Aurélien Charbon, Mailing list NFSv4, netdev ML
In-Reply-To: <46BB2F65.4080602@oracle.com>

On Thursday August 9, chuck.lever@oracle.com wrote:
> >>
> >> How have you tested the effectiveness of the new hash function?
> > 
> > I have not tested that point but I can easily imagine there are better 
> > solutions.
> > Perhaps we can keep the same function for an IPv4 address (only taking 
> > the 32 bits of IPv4 addr), and then design one for IPv6 addresses.
> 
> I see that, to generate the hash, you would be xor-ing the FF and 00 
> bytes in the canonicalized IPv4 address.  Yes, perhaps a better function 
> is needed, or as you say, one specifically for IPv6 and one for 
> canonicalized IPv4.
> 

I suspect that the given hash function will be as good as any other.
The values in each byte of an IPv6 are likely to be either evenly
distributed, or constant.
Each the first case you don't need a clever hash function to
distribute them, while in the second, no hash function can improve the
distribution. 

NeilBrown

^ permalink raw reply

* Re: net-2.6.24 GIT state
From: David Miller @ 2007-08-10  1:27 UTC (permalink / raw)
  To: ilpo.jarvinen; +Cc: netdev
In-Reply-To: <Pine.LNX.4.64.0708091721030.24369@kivilampi-30.cs.helsinki.fi>

From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
Date: Thu, 9 Aug 2007 17:40:07 +0300 (EEST)

> Next step:
> 
> http://www.cs.helsinki.fi/u/ijjarvin/tcp-rebase/after-reorder/
> 
> ...I put some effort to reorganization and there's the result, please 
> _don't_ take the new SACK block-validator yet as I haven't yet put it 
> under any testing (it's the only difference this reordering of patches 
> seems to have introduced :-)). The first five commits touch left_out (4 
> latest fix bug and "flaws" I made in the first commit), I could combine 
> them into a single one if you want to.

I applied everything up until the SACK validator to net-2.6.24

Thanks for doing all of this merging work, I very much appreciate
it.

> ...Notice that part of the next step includes work that hasn't yet been 
> posted to netdev at all (was from my not yet submitted wip portion of 
> it). Would you prefer me to post them using the usual procedure?

Everything I hit today which had not been posted before was trivial
fix or a reasonable small cleanup.

Why don't you just post the patches to the list one-by-one from
now on so we can review them in parallel with merging into
net-2.6.24?

Thanks again Ilpo!

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox