Netdev List

Netdev List
 help / color / mirror / Atom feed

* [ofa-general] Re: [PATCH 0/9 Rev3] Implement batching skb API and support in IPoIB
From: Krishna Kumar2 @ 2007-08-09  3:19 UTC (permalink / raw)
  To: Herbert Xu
  Cc: jagana, johnpol, gaagaan, jeff, Robert.Olsson, kumarkr, mcarlson,
	peter.p.waskiewicz.jr, hadi, kaber, netdev, general, mchan, tgraf,
	sri, shemminger, David Miller, rdreier
In-Reply-To: <20070808134247.GA9942@gondor.apana.org.au>

Herbert Xu <herbert@gondor.apana.org.au> wrote on 08/08/2007 07:12:47 PM:

> On Wed, Aug 08, 2007 at 03:49:00AM -0700, David Miller wrote:
> >
> > Not because I think it obviates your work, but rather because I'm
> > curious, could you test a TSO-in-hardware driver converted to
> > batching and see how TSO alone compares to batching for a pure
> > TCP workload?
>
> You could even lower the bar by disabling TSO and enabling
> software GSO.

I will try with E1000 (though I didn't see improvement when I tested a long
time back). The difference I expect is that TSO would help with large
packets and not necessarily small/medium packets and not definitely in
the case of multiple different skbs (as opposed to single large skb)
getting
queue'd. I think these are two different workloads.

> > I personally don't think it will help for that case at all as
> > TSO likely does better job of coalescing the work _and_ reducing
> > bus traffic as well as work in the TCP stack.
>
> I agree.  I suspect the bulk of the effort is in getting
> these skb's created and processed by the stack so that by
> the time that they're exiting the qdisc there's not much
> to be saved anymore.

However, I am getting a large improvement for IPoIB specifically for this
same case. The reason - batching will help only when queue gets full and
stopped (and to a lesser extent if tx lock was not got, which results
in fewer amount of batching that can be done).

thanks,

- KK

^ permalink raw reply

* Re: [PATCH 3/9 Rev3] [sched] Modify qdisc_run to support batching
From: Krishna Kumar2 @ 2007-08-09  3:13 UTC (permalink / raw)
  To: Evgeniy Polyakov
  Cc: davem, gaagaan, general, hadi, herbert, jagana, jeff, kaber,
	kumarkr, mcarlson, mchan, netdev, peter.p.waskiewicz.jr, rdreier,
	rick.jones2, Robert.Olsson, shemminger, sri, tgraf, xma
In-Reply-To: <20070808121401.GB8478@2ka.mipt.ru>

Evgeniy Polyakov <johnpol@2ka.mipt.ru> wrote on 08/08/2007 05:44:02 PM:

> On Wed, Aug 08, 2007 at 03:01:45PM +0530, Krishna Kumar
(krkumar2@in.ibm.com) wrote:
> > +static inline int get_skb(struct net_device *dev, struct Qdisc *q,
> > +           struct sk_buff_head *blist, struct sk_buff **skbp)
> > +{
> > +   if (likely(!blist || (!skb_queue_len(blist) && qdisc_qlen(q) <=
1))) {
> > +      return likely((*skbp = dev_dequeue_skb(dev, q)) != NULL);
> > +   } else {
> > +      int max = dev->tx_queue_len - skb_queue_len(blist);
> > +      struct sk_buff *skb;
> > +
> > +      while (max > 0 && (skb = dev_dequeue_skb(dev, q)) != NULL)
> > +         max -= dev_add_skb_to_blist(skb, dev);
> > +
> > +      *skbp = NULL;
> > +      return 1;   /* we have atleast one skb in blist */
> > +   }
> > +}
>
> Same here - is it possible to get a list in one go instead of pulling
> one-by-one, since it forces quite a few additional unneded lock
> get/releases. What about dev_dequeue_number_skb(dev, q, num), which will
> grab the lock and move a list of skbs from one queue to provided head.

OK, I will try this out.

> > @@ -158,7 +198,10 @@ static inline int qdisc_restart(struct n
> >     /* And release queue */
> >     spin_unlock(&dev->queue_lock);
> >
> > -   ret = dev_hard_start_xmit(skb, dev);
> > +   if (likely(skb))
> > +      ret = dev_hard_start_xmit(skb, dev);
> > +   else
> > +      ret = dev->hard_start_xmit_batch(dev);
>
> Perfectionism says that having array of two functions and calling one of
> them via array_func_pointer[!!skb] will be much faster. Just a though.
> It is actually much faster than if/else on x86 at least.

Thinking about this - I will have to store the 2 pointer array in dev
itself wasting some space, and also fn pointer will have wrong signature
as one takes an extra argument. Will ponder some more :)

thanks,

- KK


^ permalink raw reply

* [ofa-general] Re: [PATCH 2/9 Rev3] [core] Add skb_blist & hard_start_xmit_batch
From: Krishna Kumar2 @ 2007-08-09  3:09 UTC (permalink / raw)
  To: Evgeniy Polyakov
  Cc: jagana, mcarlson, herbert, gaagaan, Robert.Olsson, kumarkr,
	rdreier, peter.p.waskiewicz.jr, hadi, kaber, jeff, general, mchan,
	tgraf, netdev, shemminger, davem, sri
In-Reply-To: <20070808120142.GA26884@2ka.mipt.ru>

Hi Evgeniy,

Evgeniy Polyakov <johnpol@2ka.mipt.ru> wrote on 08/08/2007 05:31:43 PM:

> > +int dev_change_tx_batch_skb(struct net_device *dev, unsigned long
new_batch_skb)
> > +{
> > +   int ret = 0;
> > +   struct sk_buff_head *blist;
> > +
> > +   if (!dev->hard_start_xmit_batch) {
> > +      /* Driver doesn't support batching skb API */
> > +      ret = -ENOTSUPP;
> > +      goto out;
> > +   }
> > +
> > +   /* Handle invalid argument */
> > +   if (new_batch_skb < 0) {
> > +      ret = -EINVAL;
> > +      goto out;
> > +   }
> It is unsigned, how can it be less than zero?

Yuck, originally I had it as int and changed to ulong and forgot to
remove this check.

> And actually you use it just like a binary flag (casted to/from u32 in
> the code, btw), so why not using ethtool_value directly here?

I still need to check if the value is changing, so the one check is needed.
Later I am using it as a value directly.

> > +   /* Check if new value is same as the current */
> > +   if (!!dev->skb_blist == !!new_batch_skb)
> > +      goto out;
> > +
> > +   if (new_batch_skb &&
> > +       (blist = kmalloc(sizeof *blist, GFP_KERNEL)) == NULL) {
> > +      ret = -ENOMEM;
> > +      goto out;
> > +   }
> > +
> > +   spin_lock(&dev->queue_lock);
> > +   if (new_batch_skb) {
> > +      skb_queue_head_init(blist);
> > +      dev->skb_blist = blist;
> > +   } else
> > +      free_batching(dev);
> > +   spin_unlock(&dev->queue_lock);
>
> This needs bh lock too, since blist is accessed by qdisc_restart.

Yes, had it in the code, put it in the list of changes, but missed it for
some reason :(

> > +int dev_add_skb_to_blist(struct sk_buff *skb, struct net_device *dev)
> > +{
> > +   if (!list_empty(&ptype_all))
> > +      dev_queue_xmit_nit(skb, dev);
> > +
> > +   if (netif_needs_gso(dev, skb)) {
> > +      if (unlikely(dev_gso_segment(skb))) {
> > +         kfree(skb);
> > +         return 0;
> > +      }
> > +
> > +      if (skb->next) {
> > +         int count = 0;
> > +
> > +         do {
> > +            struct sk_buff *nskb = skb->next;
> > +
> > +            skb->next = nskb->next;
> > +            __skb_queue_tail(dev->skb_blist, nskb);
> > +            count++;
> > +         } while (skb->next);
>
> Is it possible to move list without iterating over each entry?

Though I cannot see something obvious to do that, let me see if
something is possible as it will make a good difference.

thanks for your suggestions,

- KK

^ permalink raw reply

* Re: [PATCH] make atomic_t volatile on all architectures
From: David Miller @ 2007-08-09  1:48 UTC (permalink / raw)
  To: herbert
  Cc: csnook, akpm, torvalds, ak, heiko.carstens, linux-kernel, netdev,
	schwidefsky, wensong, horms, wjiang, cfriesen, zlynx
In-Reply-To: <E1IIwQx-000468-00@gondolin.me.apana.org.au>

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Thu, 09 Aug 2007 09:03:27 +0800

> Such loops should always use something like cpu_relax() which comes
> with a barrier.

This is an excellent point.

And it needs to be weighed with the error prone'ness Andrew mentioned.
There probably is a middle ground somewhere.

^ permalink raw reply

* Re: [PATCH] SSB PCI core driver fixes
From: Aurelien Jarno @ 2007-08-09  0:38 UTC (permalink / raw)
  To: Michael Buesch; +Cc: netdev
In-Reply-To: <200708090227.06510.mb@bu3sch.de>

Michael Buesch a écrit :
> On Thursday 09 August 2007, Aurelien Jarno wrote:
>>  - Add some delay between the configuration of the PCI controller 
>>    and its registration.
> 
> Why? It is _huge_ and people won't like it ;)
> At least add a comment why this is needed.

It is need, otherwise the PCI controller gets confused, which causes a
reset of the machine. Then CFE goes into a loop booting again and again
without being able to get up to the prompt.

I agree this is a huge value, so I will try to find the minimum value
and then add some margin.

I will send an updated patch.

>> @@ -340,6 +345,7 @@
>>  	 * The following needs change, if we want to port hostmode
>>  	 * to non-MIPS platform. */
>>  	set_io_port_base((unsigned long)ioremap_nocache(SSB_PCI_MEM, 0x04000000));
>> +	mdelay(300);
>>  	register_pci_controller(&ssb_pcicore_controller);
>>  }
>>
> 
> 
> 


-- 
  .''`.  Aurelien Jarno	            | GPG: 1024D/F1BCDB73
 : :' :  Debian developer           | Electrical Engineer
 `. `'   aurel32@debian.org         | aurelien@aurel32.net
   `-    people.debian.org/~aurel32 | www.aurel32.net

^ permalink raw reply

* Re: [PATCH] SSB PCI core driver fixes
From: Aurelien Jarno @ 2007-08-09  1:09 UTC (permalink / raw)
  To: Michael Buesch; +Cc: netdev
In-Reply-To: <46BA6200.4060306@aurel32.net>

On Thu, Aug 09, 2007 at 02:38:24AM +0200, Aurelien Jarno wrote:
> Michael Buesch a écrit :
> > On Thursday 09 August 2007, Aurelien Jarno wrote:
> >>  - Add some delay between the configuration of the PCI controller 
> >>    and its registration.
> > 
> > Why? It is _huge_ and people won't like it ;)
> > At least add a comment why this is needed.
> 
> It is need, otherwise the PCI controller gets confused, which causes a
> reset of the machine. Then CFE goes into a loop booting again and again
> without being able to get up to the prompt.
> 
> I agree this is a huge value, so I will try to find the minimum value
> and then add some margin.
> 
> I will send an updated patch.
> 

A few experiments have shown that the minimum value is 3ms on my WGT634U
machine. I haved changed the value to 10ms in the new patch below, which
gives some margin, and added a comment.

Aurelien



The patch below against 2.6.23-rc1-mm2 fixes various things on the SSB
PCI core driver:
 - Correctly write the configuration register value in 
   ssb_extpci_write_config() for len = 1 or len = 2.
 - Set the PCI_LATENCY_TIMER to handle devices on the PCI bus.
 - Set the PCI arbiter control to internal.
 - Add some delay between the configuration of the PCI controller 
   and its registration.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>

--- a/drivers/ssb/driver_pcicore.c
+++ b/drivers/ssb/driver_pcicore.c
@@ -99,6 +99,9 @@
 
 	/* Enable PCI bridge BAR1 prefetch and burst */
 	pci_write_config_dword(dev, SSB_BAR1_CONTROL, 3);
+
+	/* Make sure our latency is high enough to handle the devices behind us */
+	pci_write_config_byte(dev, PCI_LATENCY_TIMER, 0xa8);
 }
 DECLARE_PCI_FIXUP_EARLY(PCI_ANY_ID, PCI_ANY_ID, ssb_fixup_pcibridge);
 
@@ -230,7 +233,7 @@
 		val = *((const u32 *)buf);
 		break;
 	}
-	writel(*((const u32 *)buf), mmio);
+	writel(val, mmio);
 
 	err = 0;
 unmap:
@@ -311,6 +314,8 @@
 	udelay(150); /* Assertion time demanded by the PCI standard */
 	val |= SSB_PCICORE_CTL_RST; /* Deassert RST# */
 	pcicore_write32(pc, SSB_PCICORE_CTL, val);
+	val = SSB_PCICORE_ARBCTL_INTERN;
+	pcicore_write32(pc, SSB_PCICORE_ARBCTL, val);
 	udelay(1); /* Assertion time demanded by the PCI standard */
 
 	/*TODO cardbus mode */
@@ -340,6 +345,9 @@
 	 * The following needs change, if we want to port hostmode
 	 * to non-MIPS platform. */
 	set_io_port_base((unsigned long)ioremap_nocache(SSB_PCI_MEM, 0x04000000));
+	/* Give some time to the PCI controller to configure itself with the new
+	 * values. Not waiting at this point causes crashes of the machine. */
+	mdelay(10);
 	register_pci_controller(&ssb_pcicore_controller);
 }
 

-- 
  .''`.  Aurelien Jarno	            | GPG: 1024D/F1BCDB73
 : :' :  Debian developer           | Electrical Engineer
 `. `'   aurel32@debian.org         | aurelien@aurel32.net
   `-    people.debian.org/~aurel32 | www.aurel32.net

^ permalink raw reply

* Re: [PATCH] make atomic_t volatile on all architectures
From: Herbert Xu @ 2007-08-09  1:03 UTC (permalink / raw)
  To: Chris Snook
  Cc: akpm, torvalds, ak, heiko.carstens, davem, linux-kernel, netdev,
	schwidefsky, wensong, horms, wjiang, cfriesen, zlynx
In-Reply-To: <20070808230733.GA17270@shell.boston.redhat.com>

Chris Snook <csnook@redhat.com> wrote:
> 
> Some architectures currently do not declare the contents of an atomic_t to be
> volatile.  This causes confusion since atomic_read() might not actually read
> anything if an optimizing compiler re-uses a value stored in a register, which
> can break code that loops until something external changes the value of an
> atomic_t.  Avoiding such bugs requires using barrier(), which causes re-loads

Such loops should always use something like cpu_relax() which comes
with a barrier.

> of all registers used in the loop, thus hurting performance instead of helping
> it, particularly on architectures where it's unnecessary.  Since we generally

Do you have an example of such a loop where performance is hurt by this?

The IPVS code that led to this patch was simply broken and has been
fixed to use cpu_relax().

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH] SSB PCI core driver fixes
From: Michael Buesch @ 2007-08-09  0:27 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: netdev
In-Reply-To: <20070809002353.GA4594@hall.aurel32.net>

On Thursday 09 August 2007, Aurelien Jarno wrote:
>  - Add some delay between the configuration of the PCI controller 
>    and its registration.

Why? It is _huge_ and people won't like it ;)
At least add a comment why this is needed.

> @@ -340,6 +345,7 @@
>  	 * The following needs change, if we want to port hostmode
>  	 * to non-MIPS platform. */
>  	set_io_port_base((unsigned long)ioremap_nocache(SSB_PCI_MEM, 0x04000000));
> +	mdelay(300);
>  	register_pci_controller(&ssb_pcicore_controller);
>  }
> 



^ permalink raw reply

* [PATCH] SSB PCI core driver fixes
From: Aurelien Jarno @ 2007-08-09  0:23 UTC (permalink / raw)
  To: Michael Buesch; +Cc: netdev

The patch below against 2.6.23-rc1-mm2 fixes various things on the SSB
PCI core driver:
 - Correctly write the configuration register value in 
   ssb_extpci_write_config() for len = 1 or len = 2.
 - Set the PCI latency timer to handle devices on the PCI bus.
 - Set the PCI arbiter control to internal.
 - Add some delay between the configuration of the PCI controller 
   and its registration.

Note: this is the latest SSB patch from my local tree. Next ones are 
only BCM947xx or CFE related.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>

--- a/drivers/ssb/driver_pcicore.c
+++ b/drivers/ssb/driver_pcicore.c
@@ -99,6 +99,9 @@
 
 	/* Enable PCI bridge BAR1 prefetch and burst */
 	pci_write_config_dword(dev, SSB_BAR1_CONTROL, 3);
+
+	/* Make sure our latency is high enough to handle the devices behind us */
+	pci_write_config_byte(dev, PCI_LATENCY_TIMER, 0xa8);
 }
 DECLARE_PCI_FIXUP_EARLY(PCI_ANY_ID, PCI_ANY_ID, ssb_fixup_pcibridge);
 
@@ -230,7 +233,7 @@
 		val = *((const u32 *)buf);
 		break;
 	}
-	writel(*((const u32 *)buf), mmio);
+	writel(val, mmio);
 
 	err = 0;
 unmap:
@@ -311,6 +314,8 @@
 	udelay(150); /* Assertion time demanded by the PCI standard */
 	val |= SSB_PCICORE_CTL_RST; /* Deassert RST# */
 	pcicore_write32(pc, SSB_PCICORE_CTL, val);
+	val = SSB_PCICORE_ARBCTL_INTERN;
+	pcicore_write32(pc, SSB_PCICORE_ARBCTL, val);
 	udelay(1); /* Assertion time demanded by the PCI standard */
 
 	/*TODO cardbus mode */
@@ -340,6 +345,7 @@
 	 * The following needs change, if we want to port hostmode
 	 * to non-MIPS platform. */
 	set_io_port_base((unsigned long)ioremap_nocache(SSB_PCI_MEM, 0x04000000));
+	mdelay(300);
 	register_pci_controller(&ssb_pcicore_controller);
 }

-- 
  .''`.  Aurelien Jarno	            | GPG: 1024D/F1BCDB73
 : :' :  Debian developer           | Electrical Engineer
 `. `'   aurel32@debian.org         | aurelien@aurel32.net
   `-    people.debian.org/~aurel32 | www.aurel32.net

^ permalink raw reply

* Re: [patch] ipvs: force read of atomic_t in while loop
From: Andi Kleen @ 2007-08-09  0:15 UTC (permalink / raw)
  To: Chris Snook
  Cc: Heiko Carstens, andi, David Miller, akpm, linux-kernel, netdev,
	schwidefsky, wensong, horms, torvalds
In-Reply-To: <46BA30DC.20207@redhat.com>

On Wed, Aug 08, 2007 at 05:08:44PM -0400, Chris Snook wrote:
> Heiko Carstens wrote:
> >On Wed, Aug 08, 2007 at 03:21:31AM -0700, David Miller wrote:
> >>From: Heiko Carstens <heiko.carstens@de.ibm.com>
> >>Date: Wed, 8 Aug 2007 11:33:00 +0200
> >>
> >>>Just saw this while grepping for atomic_reads in a while loops.
> >>>Maybe we should re-add the volatile to atomic_t. Not sure.
> >>I think whatever the choice, it should be done consistently
> >>on every architecture.
> >>
> >>It's just asking for trouble if your arch does it differently from
> >>every other.
> >
> >Well..currently it's i386/x86_64 and s390 which have no volatile
> >in atomic_t. And yes, of course I agree it should be consistent
> >across all architectures. But it isn't.
> 
> Based on recent discussion, it's pretty clear that there's a lot of 
> confusion about this.  A lot of people (myself included, until I thought 
> about it long and hard) will reasonably assume that calling 
> atomic_read() will actually read the value from memory.  Leaving out the 
> volatile declaration seems like a pessimization to me.  If you force 
> people to use barrier() everywhere they're working with atomic_t, it 
> will force re-reads of all the non-atomic data in use as well, which 
> will cause more memory fetches of things that generally don't need 
> barrier().  That and it's a bug waiting to happen.
> 
> Andi -- your thoughts on the matter?

I also think readding volatile makes sense. An alternative would be
to stick an rmb() into atomic_read() -- that would also stop speculative reads.
Disadvantage is that it clobbers all memory, not just the specific value.

But you really have to complain to Linus (cc'ed). He came up
with the volatile removale change iirc.

-Andi


^ permalink raw reply

* Re: [PATCH 0/9 Rev3] Implement batching skb API and support in IPoIB
From: Shirley Ma @ 2007-08-09  0:06 UTC (permalink / raw)
  To: Herbert Xu
  Cc: David Miller, gaagaan, general, hadi, jeff, johnpol, kaber,
	krkumar2, kumarkr, mcarlson, mchan, netdev, peter.p.waskiewicz.jr,
	rdreier, rick.jones2, Robert.Olsson, shemminger, sri, tgraf,
	Venkata Jagana
In-Reply-To: <20070808134247.GA9942@gondor.apana.org.au>

Hello Herbert,

> > Not because I think it obviates your work, but rather because I'm
> > curious, could you test a TSO-in-hardware driver converted to
> > batching and see how TSO alone compares to batching for a pure
> > TCP workload?
> 
> You could even lower the bar by disabling TSO and enabling
> software GSO.

We had a discuss before. GSO doesn't benefit device which has no HW 
checksum (like IPoIB) by inducing an extra copy. And GSO benefits one 
stream, batching benefits multiple streams.

Thanks
Shirley

^ permalink raw reply

* Re: [PATCH] make atomic_t volatile on all architectures
From: Jesper Juhl @ 2007-08-08 23:51 UTC (permalink / raw)
  To: Chris Snook
  Cc: akpm, torvalds, ak, heiko.carstens, davem, linux-kernel, netdev,
	schwidefsky, wensong, horms, wjiang, cfriesen, zlynx
In-Reply-To: <46BA524F.2070803@redhat.com>

On 09/08/2007, Chris Snook <csnook@redhat.com> wrote:
> Jesper Juhl wrote:
> > On 09/08/2007, Chris Snook <csnook@redhat.com> wrote:
> >> From: Chris Snook <csnook@redhat.com>
> >>
> >> Some architectures currently do not declare the contents of an atomic_t to be
> >> volatile.  This causes confusion since atomic_read() might not actually read
> >> anything if an optimizing compiler re-uses a value stored in a register, which
> >> can break code that loops until something external changes the value of an
> >> atomic_t.  Avoiding such bugs requires using barrier(), which causes re-loads
> >> of all registers used in the loop, thus hurting performance instead of helping
> >> it, particularly on architectures where it's unnecessary.  Since we generally
> >> want to re-read the contents of an atomic variable on every access anyway,
> >> let's standardize the behavior across all architectures and avoid the
> >> performance and correctness problems of requiring the use of barrier() in
> >> loops that expect atomic_t variables to change externally.  This is relevant
> >> even on non-smp architectures, since drivers may use atomic operations in
> >> interrupt handlers.
> >>
> >> Signed-off-by: Chris Snook <csnook@redhat.com>
> >>
> >
> > Hmm, I thought we were trying to move away from volatile since it is
> > very weakly defined and towards explicit barriers and locks...
> > Points to --> Documentation/volatile-considered-harmful.txt
>
> This is a special case.

I had a feeling it might be.

> Usually, the use of volatile is just lazy.  In
> this case, it's probably necessary on at least some architectures, so we
> can't remove it everywhere unless we want to rewrite atomic.h completely
> in inline assembler for several architectures, and painstakingly verify
> all that work.

Sounds quite unpleasant and actively harmful - keeping stuff in plain
readable C makes it easier to handle by most people.

> I would hope it's obvious that having consistent
> behavior on all architectures, or even at all compiler optimization
> levels within an architecture, can be agreed to be good.

Agreed, consistency is good.

>  Additionally,
> atomic_t variables are a rare exception, in that we pretty much always
> want to work with the latest value in RAM on every access.  Making this
> atomic will allow us to remove a bunch of barriers which do nothing but
> slow things down on most architectures.
>
I believe you meant to say "volatile" and not "atomic" above.  But
yes, if volatile is sufficiently defined to ensure it does what we
want in this case and using it means we can actually speed up the
kernel, then it does indeed sound reasonable. Especially since it's
confined to the implementation of atomic_t which most people shouldn't
be looking at anyway (but simply use) and when using an atomic_t it's
pretty reasonable to expect that it doesn't need any additional
locking or barriers since it's supposed to be *atomic*.

> I agree that the use of atomic_t in .c files is generally bad, but in
> certain special cases, hiding it inside defined data types may be worth
> the slight impurity, just as we sometimes tolerate lines longer than 80
> columns when "fixing" it makes things unreadable.
>
Can't argue with that.

It seems you've thought it through.  I just wanted to be sure that
this approach was actually the one that makes the most sense, and
you've convinced me of that :-)

-- 
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html

^ permalink raw reply

* Re: [PATCH] make atomic_t volatile on all architectures
From: Chris Snook @ 2007-08-08 23:35 UTC (permalink / raw)
  To: Lennert Buytenhek
  Cc: akpm, torvalds, ak, heiko.carstens, davem, linux-kernel, netdev,
	schwidefsky, wensong, horms, wjiang, cfriesen, zlynx
In-Reply-To: <20070808232521.GC8460@xi.wantstofly.org>

Lennert Buytenhek wrote:
> On Wed, Aug 08, 2007 at 07:07:33PM -0400, Chris Snook wrote:
> 
>> From: Chris Snook <csnook@redhat.com>
>>
>> Some architectures currently do not declare the contents of an atomic_t to be
>> volatile.  This causes confusion since atomic_read() might not actually read
>> anything if an optimizing compiler re-uses a value stored in a register, which
>> can break code that loops until something external changes the value of an
>> atomic_t.  Avoiding such bugs requires using barrier(), which causes re-loads
>> of all registers used in the loop, thus hurting performance instead of helping
>> it, particularly on architectures where it's unnecessary.  Since we generally
>> want to re-read the contents of an atomic variable on every access anyway,
>> let's standardize the behavior across all architectures and avoid the
>> performance and correctness problems of requiring the use of barrier() in
>> loops that expect atomic_t variables to change externally.  This is relevant
>> even on non-smp architectures, since drivers may use atomic operations in
>> interrupt handlers.
>>
>> Signed-off-by: Chris Snook <csnook@redhat.com>
> 
> Documentation/atomic_ops.txt would need updating:
> 
> 	[...]
> 
> 	One very important aspect of these two routines is that they DO NOT
> 	require any explicit memory barriers.  They need only perform the
> 	atomic_t counter update in an SMP safe manner.

Thanks, I was looking for that.  I'll re-send shortly with my comment 
moved there.  People are already using atomic_t in a manner that implies 
the use of memory barriers and interrupt-safety, which is what the patch 
enforces.

	-- Chris

^ permalink raw reply

* Re: [PATCH] make atomic_t volatile on all architectures
From: Chris Snook @ 2007-08-08 23:31 UTC (permalink / raw)
  To: Jesper Juhl
  Cc: akpm, torvalds, ak, heiko.carstens, davem, linux-kernel, netdev,
	schwidefsky, wensong, horms, wjiang, cfriesen, zlynx
In-Reply-To: <9a8748490708081618p43cdcdfdn5de6f292247def5b@mail.gmail.com>

Jesper Juhl wrote:
> On 09/08/2007, Chris Snook <csnook@redhat.com> wrote:
>> From: Chris Snook <csnook@redhat.com>
>>
>> Some architectures currently do not declare the contents of an atomic_t to be
>> volatile.  This causes confusion since atomic_read() might not actually read
>> anything if an optimizing compiler re-uses a value stored in a register, which
>> can break code that loops until something external changes the value of an
>> atomic_t.  Avoiding such bugs requires using barrier(), which causes re-loads
>> of all registers used in the loop, thus hurting performance instead of helping
>> it, particularly on architectures where it's unnecessary.  Since we generally
>> want to re-read the contents of an atomic variable on every access anyway,
>> let's standardize the behavior across all architectures and avoid the
>> performance and correctness problems of requiring the use of barrier() in
>> loops that expect atomic_t variables to change externally.  This is relevant
>> even on non-smp architectures, since drivers may use atomic operations in
>> interrupt handlers.
>>
>> Signed-off-by: Chris Snook <csnook@redhat.com>
>>
> 
> Hmm, I thought we were trying to move away from volatile since it is
> very weakly defined and towards explicit barriers and locks...
> Points to --> Documentation/volatile-considered-harmful.txt

This is a special case.  Usually, the use of volatile is just lazy.  In 
this case, it's probably necessary on at least some architectures, so we 
can't remove it everywhere unless we want to rewrite atomic.h completely 
in inline assembler for several architectures, and painstakingly verify 
all that work.  I would hope it's obvious that having consistent 
behavior on all architectures, or even at all compiler optimization 
levels within an architecture, can be agreed to be good.  Additionally, 
atomic_t variables are a rare exception, in that we pretty much always 
want to work with the latest value in RAM on every access.  Making this 
atomic will allow us to remove a bunch of barriers which do nothing but 
slow things down on most architectures.

I agree that the use of atomic_t in .c files is generally bad, but in 
certain special cases, hiding it inside defined data types may be worth 
the slight impurity, just as we sometimes tolerate lines longer than 80 
columns when "fixing" it makes things unreadable.

	-- Chris

^ permalink raw reply

* Re: [PATCH] make atomic_t volatile on all architectures
From: Lennert Buytenhek @ 2007-08-08 23:25 UTC (permalink / raw)
  To: Chris Snook
  Cc: akpm, torvalds, ak, heiko.carstens, davem, linux-kernel, netdev,
	schwidefsky, wensong, horms, wjiang, cfriesen, zlynx
In-Reply-To: <20070808230733.GA17270@shell.boston.redhat.com>

On Wed, Aug 08, 2007 at 07:07:33PM -0400, Chris Snook wrote:

> From: Chris Snook <csnook@redhat.com>
> 
> Some architectures currently do not declare the contents of an atomic_t to be
> volatile.  This causes confusion since atomic_read() might not actually read
> anything if an optimizing compiler re-uses a value stored in a register, which
> can break code that loops until something external changes the value of an
> atomic_t.  Avoiding such bugs requires using barrier(), which causes re-loads
> of all registers used in the loop, thus hurting performance instead of helping
> it, particularly on architectures where it's unnecessary.  Since we generally
> want to re-read the contents of an atomic variable on every access anyway,
> let's standardize the behavior across all architectures and avoid the
> performance and correctness problems of requiring the use of barrier() in
> loops that expect atomic_t variables to change externally.  This is relevant
> even on non-smp architectures, since drivers may use atomic operations in
> interrupt handlers.
> 
> Signed-off-by: Chris Snook <csnook@redhat.com>

Documentation/atomic_ops.txt would need updating:

	[...]

	One very important aspect of these two routines is that they DO NOT
	require any explicit memory barriers.  They need only perform the
	atomic_t counter update in an SMP safe manner.

^ permalink raw reply

* Re: [PATCH RFC]: napi_struct V5
From: Shirley Ma @ 2007-08-08 23:23 UTC (permalink / raw)
  To: David Miller
  Cc: hadi, jgarzik, netdev, netdev-owner, rdreier, rusty, shemminger
In-Reply-To: <20070808.152059.101495015.davem@davemloft.net>

Dave,

> I reverted everything Roland had an issue with, I got tired of arguing
> my position and doing all of the coding too.  He won.
Thanks. I think the reschedule in the IPoIB poll should maintain fairness.

Shirley

^ permalink raw reply

* Re: [PATCH] make atomic_t volatile on all architectures
From: Jesper Juhl @ 2007-08-08 23:18 UTC (permalink / raw)
  To: Chris Snook
  Cc: akpm, torvalds, ak, heiko.carstens, davem, linux-kernel, netdev,
	schwidefsky, wensong, horms, wjiang, cfriesen, zlynx
In-Reply-To: <20070808230733.GA17270@shell.boston.redhat.com>

On 09/08/2007, Chris Snook <csnook@redhat.com> wrote:
> From: Chris Snook <csnook@redhat.com>
>
> Some architectures currently do not declare the contents of an atomic_t to be
> volatile.  This causes confusion since atomic_read() might not actually read
> anything if an optimizing compiler re-uses a value stored in a register, which
> can break code that loops until something external changes the value of an
> atomic_t.  Avoiding such bugs requires using barrier(), which causes re-loads
> of all registers used in the loop, thus hurting performance instead of helping
> it, particularly on architectures where it's unnecessary.  Since we generally
> want to re-read the contents of an atomic variable on every access anyway,
> let's standardize the behavior across all architectures and avoid the
> performance and correctness problems of requiring the use of barrier() in
> loops that expect atomic_t variables to change externally.  This is relevant
> even on non-smp architectures, since drivers may use atomic operations in
> interrupt handlers.
>
> Signed-off-by: Chris Snook <csnook@redhat.com>
>

Hmm, I thought we were trying to move away from volatile since it is
very weakly defined and towards explicit barriers and locks...
Points to --> Documentation/volatile-considered-harmful.txt


-- 
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html

^ permalink raw reply

* Re: [PATCH RFC]: napi_struct V6
From: David Miller @ 2007-08-08 23:18 UTC (permalink / raw)
  To: jeff; +Cc: shemminger, netdev, hadi, rusty
In-Reply-To: <46B9EEF7.7060009@garzik.org>

From: Jeff Garzik <jeff@garzik.org>
Date: Wed, 08 Aug 2007 12:27:35 -0400

> Someone please make sure Documentation/networking/netdevices.txt remains 
> correct with regards to APIs and locking.
> 
> It -is- up to date, unlike NAPI howto.  It should not change very much 
> due to this napi_struct work though, if at all.

I'll take care of it when I cut my net-2.6.24 tree later today.

^ permalink raw reply

* [PATCH] e1000e: Fix header includes
From: Auke Kok @ 2007-08-08 23:15 UTC (permalink / raw)
  To: jeff; +Cc: akpm, netdev, john.ronciak, auke-jan.h.kok

Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
---

 drivers/net/e1000e/82571.c   |    4 ++++
 drivers/net/e1000e/e1000.h   |    7 ++++---
 drivers/net/e1000e/es2lan.c  |    5 +++++
 drivers/net/e1000e/ethtool.c |    3 ++-
 drivers/net/e1000e/hw.h      |    2 ++
 drivers/net/e1000e/ich8lan.c |    5 +++++
 drivers/net/e1000e/lib.c     |    2 ++
 drivers/net/e1000e/netdev.c  |    1 +
 8 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/drivers/net/e1000e/82571.c b/drivers/net/e1000e/82571.c
index ddf2303..0f8f0ac 100644
--- a/drivers/net/e1000e/82571.c
+++ b/drivers/net/e1000e/82571.c
@@ -37,6 +37,10 @@
  * 82573L Gigabit Ethernet Controller
  */
 
+#include <linux/netdevice.h>
+#include <linux/delay.h>
+#include <linux/pci.h>
+
 #include "e1000.h"
 
 #define ID_LED_RESERVED_F746 0xF746
diff --git a/drivers/net/e1000e/e1000.h b/drivers/net/e1000e/e1000.h
index a1394d6..de17537 100644
--- a/drivers/net/e1000e/e1000.h
+++ b/drivers/net/e1000e/e1000.h
@@ -31,10 +31,11 @@
 #ifndef _E1000_H_
 #define _E1000_H_
 
+#include <linux/types.h>
+#include <linux/timer.h>
+#include <linux/workqueue.h>
+#include <linux/io.h>
 #include <linux/netdevice.h>
-#include <linux/ethtool.h>
-#include <linux/pci.h>
-#include <asm/io.h>
 
 #include "hw.h"
 
diff --git a/drivers/net/e1000e/es2lan.c b/drivers/net/e1000e/es2lan.c
index 5604c50..8100d03 100644
--- a/drivers/net/e1000e/es2lan.c
+++ b/drivers/net/e1000e/es2lan.c
@@ -31,6 +31,11 @@
  * 80003ES2LAN Gigabit Ethernet Controller (Serdes)
  */
 
+#include <linux/netdevice.h>
+#include <linux/ethtool.h>
+#include <linux/delay.h>
+#include <linux/pci.h>
+
 #include "e1000.h"
 
 #define E1000_KMRNCTRLSTA_OFFSET_FIFO_CTRL	 0x00
diff --git a/drivers/net/e1000e/ethtool.c b/drivers/net/e1000e/ethtool.c
index 6c417ea..a8fa1db 100644
--- a/drivers/net/e1000e/ethtool.c
+++ b/drivers/net/e1000e/ethtool.c
@@ -29,8 +29,9 @@
 /* ethtool support for e1000 */
 
 #include <linux/netdevice.h>
-
 #include <linux/ethtool.h>
+#include <linux/pci.h>
+#include <linux/delay.h>
 
 #include "e1000.h"
 
diff --git a/drivers/net/e1000e/hw.h b/drivers/net/e1000e/hw.h
index 4d562c4..848217a 100644
--- a/drivers/net/e1000e/hw.h
+++ b/drivers/net/e1000e/hw.h
@@ -29,6 +29,8 @@
 #ifndef _E1000_HW_H_
 #define _E1000_HW_H_
 
+#include <linux/types.h>
+
 struct e1000_hw;
 struct e1000_adapter;
 
diff --git a/drivers/net/e1000e/ich8lan.c b/drivers/net/e1000e/ich8lan.c
index 042abd4..85095af 100644
--- a/drivers/net/e1000e/ich8lan.c
+++ b/drivers/net/e1000e/ich8lan.c
@@ -40,6 +40,11 @@
  * 82566MM Gigabit Network Connection
  */
 
+#include <linux/netdevice.h>
+#include <linux/ethtool.h>
+#include <linux/delay.h>
+#include <linux/pci.h>
+
 #include "e1000.h"
 
 #define ICH_FLASH_GFPREG		0x0000
diff --git a/drivers/net/e1000e/lib.c b/drivers/net/e1000e/lib.c
index 3bbe63e..c92ea77 100644
--- a/drivers/net/e1000e/lib.c
+++ b/drivers/net/e1000e/lib.c
@@ -27,6 +27,8 @@
 *******************************************************************************/
 
 #include <linux/netdevice.h>
+#include <linux/ethtool.h>
+#include <linux/delay.h>
 #include <linux/pci.h>
 
 #include "e1000.h"
diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
index 741965d..0ea6eda 100644
--- a/drivers/net/e1000e/netdev.c
+++ b/drivers/net/e1000e/netdev.c
@@ -29,6 +29,7 @@
 #include <linux/module.h>
 #include <linux/types.h>
 #include <linux/init.h>
+#include <linux/pci.h>
 #include <linux/vmalloc.h>
 #include <linux/pagemap.h>
 #include <linux/netdevice.h>

^ permalink raw reply related

* [PATCH] make atomic_t volatile on all architectures
From: Chris Snook @ 2007-08-08 23:07 UTC (permalink / raw)
  To: akpm, torvalds, ak, heiko.carstens
  Cc: davem, linux-kernel, netdev, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx

From: Chris Snook <csnook@redhat.com>

Some architectures currently do not declare the contents of an atomic_t to be
volatile.  This causes confusion since atomic_read() might not actually read
anything if an optimizing compiler re-uses a value stored in a register, which
can break code that loops until something external changes the value of an
atomic_t.  Avoiding such bugs requires using barrier(), which causes re-loads
of all registers used in the loop, thus hurting performance instead of helping
it, particularly on architectures where it's unnecessary.  Since we generally
want to re-read the contents of an atomic variable on every access anyway,
let's standardize the behavior across all architectures and avoid the
performance and correctness problems of requiring the use of barrier() in
loops that expect atomic_t variables to change externally.  This is relevant
even on non-smp architectures, since drivers may use atomic operations in
interrupt handlers.

Signed-off-by: Chris Snook <csnook@redhat.com>


diff -urp linux-2.6.23-rc2-orig/include/asm-blackfin/atomic.h linux-2.6.23-rc2/include/asm-blackfin/atomic.h
--- linux-2.6.23-rc2-orig/include/asm-blackfin/atomic.h	2007-07-08 19:32:17.000000000 -0400
+++ linux-2.6.23-rc2/include/asm-blackfin/atomic.h	2007-08-08 18:18:43.000000000 -0400
@@ -13,8 +13,13 @@
  * Tony Kou (tonyko@lineo.ca)   Lineo Inc.   2001
  */
 
+/*
+ * Make atomic_t volatile to remove the need for barriers in loops that
+ * wait for an outside event.  We generally want to re-load the atomic_t
+ * variable each time anyway, but don't need to re-load everything else.
+ */
 typedef struct {
-	int counter;
+	volatile int counter;
 } atomic_t;
 #define ATOMIC_INIT(i)	{ (i) }
 
diff -urp linux-2.6.23-rc2-orig/include/asm-frv/atomic.h linux-2.6.23-rc2/include/asm-frv/atomic.h
--- linux-2.6.23-rc2-orig/include/asm-frv/atomic.h	2007-07-08 19:32:17.000000000 -0400
+++ linux-2.6.23-rc2/include/asm-frv/atomic.h	2007-08-08 18:21:55.000000000 -0400
@@ -35,8 +35,13 @@
 #define smp_mb__before_atomic_inc()	barrier()
 #define smp_mb__after_atomic_inc()	barrier()
 
+/*
+ * Make atomic_t volatile to remove the need for barriers in loops that
+ * wait for an outside event.  We generally want to re-load the atomic_t
+ * variable each time anyway, but don't need to re-load everything else.
+ */
 typedef struct {
-	int counter;
+	volatile int counter;
 } atomic_t;
 
 #define ATOMIC_INIT(i)		{ (i) }
diff -urp linux-2.6.23-rc2-orig/include/asm-h8300/atomic.h linux-2.6.23-rc2/include/asm-h8300/atomic.h
--- linux-2.6.23-rc2-orig/include/asm-h8300/atomic.h	2007-07-08 19:32:17.000000000 -0400
+++ linux-2.6.23-rc2/include/asm-h8300/atomic.h	2007-08-08 18:24:02.000000000 -0400
@@ -6,7 +6,12 @@
  * resource counting etc..
  */
 
-typedef struct { int counter; } atomic_t;
+/*
+ * Make atomic_t volatile to remove the need for barriers in loops that
+ * wait for an outside event.  We generally want to re-load the atomic_t
+ * variable each time anyway, but don't need to re-load everything else.
+ */
+typedef struct { volatile int counter; } atomic_t;
 #define ATOMIC_INIT(i)	{ (i) }
 
 #define atomic_read(v)		((v)->counter)
diff -urp linux-2.6.23-rc2-orig/include/asm-i386/atomic.h linux-2.6.23-rc2/include/asm-i386/atomic.h
--- linux-2.6.23-rc2-orig/include/asm-i386/atomic.h	2007-07-08 19:32:17.000000000 -0400
+++ linux-2.6.23-rc2/include/asm-i386/atomic.h	2007-08-08 18:26:09.000000000 -0400
@@ -14,8 +14,12 @@
  * Make sure gcc doesn't try to be clever and move things around
  * on us. We need to use _exactly_ the address the user gave us,
  * not some alias that contains the same information.
+ *
+ * Make atomic_t volatile to remove the need for barriers in loops that
+ * wait for an outside event.  We generally want to re-load the atomic_t
+ * variable each time anyway, but don't need to re-load everything else.
  */
-typedef struct { int counter; } atomic_t;
+typedef struct { volatile int counter; } atomic_t;
 
 #define ATOMIC_INIT(i)	{ (i) }
 
diff -urp linux-2.6.23-rc2-orig/include/asm-m68k/atomic.h linux-2.6.23-rc2/include/asm-m68k/atomic.h
--- linux-2.6.23-rc2-orig/include/asm-m68k/atomic.h	2007-07-08 19:32:17.000000000 -0400
+++ linux-2.6.23-rc2/include/asm-m68k/atomic.h	2007-08-08 18:28:58.000000000 -0400
@@ -13,7 +13,12 @@
  * We do not have SMP m68k systems, so we don't have to deal with that.
  */
 
-typedef struct { int counter; } atomic_t;
+/*
+ * Make atomic_t volatile to remove the need for barriers in loops that
+ * wait for an outside event.  We generally want to re-load the atomic_t
+ * variable each time anyway, but don't need to re-load everything else.
+ */
+typedef struct { volatile int counter; } atomic_t;
 #define ATOMIC_INIT(i)	{ (i) }
 
 #define atomic_read(v)		((v)->counter)
diff -urp linux-2.6.23-rc2-orig/include/asm-m68knommu/atomic.h linux-2.6.23-rc2/include/asm-m68knommu/atomic.h
--- linux-2.6.23-rc2-orig/include/asm-m68knommu/atomic.h	2007-07-08 19:32:17.000000000 -0400
+++ linux-2.6.23-rc2/include/asm-m68knommu/atomic.h	2007-08-08 18:30:32.000000000 -0400
@@ -12,7 +12,12 @@
  * We do not have SMP m68k systems, so we don't have to deal with that.
  */
 
-typedef struct { int counter; } atomic_t;
+/*
+ * Make atomic_t volatile to remove the need for barriers in loops that
+ * wait for an outside event.  We generally want to re-load the atomic_t
+ * variable each time anyway, but don't need to re-load everything else.
+ */
+typedef struct { volatile int counter; } atomic_t;
 #define ATOMIC_INIT(i)	{ (i) }
 
 #define atomic_read(v)		((v)->counter)
diff -urp linux-2.6.23-rc2-orig/include/asm-s390/atomic.h linux-2.6.23-rc2/include/asm-s390/atomic.h
--- linux-2.6.23-rc2-orig/include/asm-s390/atomic.h	2007-08-08 17:48:53.000000000 -0400
+++ linux-2.6.23-rc2/include/asm-s390/atomic.h	2007-08-08 18:40:17.000000000 -0400
@@ -23,8 +23,13 @@
  * S390 uses 'Compare And Swap' for atomicity in SMP enviroment
  */
 
+/*
+ * Make atomic_t volatile to remove the need for barriers in loops that
+ * wait for an outside event.  We generally want to re-load the atomic_t
+ * variable each time anyway, but don't need to re-load everything else.
+ */
 typedef struct {
-	int counter;
+	volatile int counter;
 } __attribute__ ((aligned (4))) atomic_t;
 #define ATOMIC_INIT(i)  { (i) }
 
diff -urp linux-2.6.23-rc2-orig/include/asm-v850/atomic.h linux-2.6.23-rc2/include/asm-v850/atomic.h
--- linux-2.6.23-rc2-orig/include/asm-v850/atomic.h	2007-07-08 19:32:17.000000000 -0400
+++ linux-2.6.23-rc2/include/asm-v850/atomic.h	2007-08-08 18:42:37.000000000 -0400
@@ -21,7 +21,12 @@
 #error SMP not supported
 #endif
 
-typedef struct { int counter; } atomic_t;
+/*
+ * Make atomic_t volatile to remove the need for barriers in loops that
+ * wait for an outside event.  We generally want to re-load the atomic_t
+ * variable each time anyway, but don't need to re-load everything else.
+ */
+typedef struct { volatile int counter; } atomic_t;
 
 #define ATOMIC_INIT(i)	{ (i) }
 
diff -urp linux-2.6.23-rc2-orig/include/asm-x86_64/atomic.h linux-2.6.23-rc2/include/asm-x86_64/atomic.h
--- linux-2.6.23-rc2-orig/include/asm-x86_64/atomic.h	2007-07-08 19:32:17.000000000 -0400
+++ linux-2.6.23-rc2/include/asm-x86_64/atomic.h	2007-08-08 18:44:18.000000000 -0400
@@ -21,8 +21,12 @@
  * Make sure gcc doesn't try to be clever and move things around
  * on us. We need to use _exactly_ the address the user gave us,
  * not some alias that contains the same information.
+ *
+ * Make atomic_t volatile to remove the need for barriers in loops that
+ * wait for an outside event.  We generally want to re-load the atomic_t
+ * variable each time anyway, but don't need to re-load everything else.
  */
-typedef struct { int counter; } atomic_t;
+typedef struct { volatile int counter; } atomic_t;
 
 #define ATOMIC_INIT(i)	{ (i) }
 

^ permalink raw reply

* [ofa-general] Re: [PATCH 0/9 Rev3] Implement batching skb API and support in IPoIB
From: jamal @ 2007-08-08 22:53 UTC (permalink / raw)
  To: David Miller
  Cc: johnpol, peter.p.waskiewicz.jr, herbert, gaagaan, Robert.Olsson,
	kumarkr, rdreier, mcarlson, netdev, jagana, general, mchan, tgraf,
	jeff, shemminger, kaber, sri
In-Reply-To: <20070808.152240.56052317.davem@davemloft.net>

On Wed, 2007-08-08 at 15:22 -0700, David Miller wrote:

> The driver path, however, does not exist on an island and what
> we care about is the final result with the changes running
> inside the full system.
> 
> So, to be honest, besides for initial internal development
> feedback, the isolated tests only have minimal merit and
> it's the full protocol tests that are really interesting.

But you cant go there if cant show the path which is supposedly improved
has indeed improved;-> I would certainly agree with you that if it
doesnt prove consistently useful with protocols it has no value
(remember thats why i never submitted these patches all this time).
We just need better analysis of the results - i cant ignore that the
selection of the clock sources for example gives me different results
and that when i boot i cant be guaranteed the same clock source. I cant
ignore the fact that i get different results when i use a different
congestion control algorithm. And none of this has to do with the
batching patches.

I am using UDP at the moment because it is simpler to analyze. And yes,
it would be "an interesting idea that gets shelved" if we cant achieve
any of the expected goals. We've shelved ideas before. BTW, read the
little doc i wrote on the dev->prep_xmit() you may find it interesting.

cheers,
jamal

^ permalink raw reply

* [ofa-general] Re: [PATCH 0/9 Rev3] Implement batching skb API and support in IPoIB
From: jamal @ 2007-08-08 22:40 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: johnpol, peter.p.waskiewicz.jr, Herbert Xu, gaagaan,
	Robert.Olsson, kumarkr, rdreier, mcarlson, David Miller, jagana,
	general, mchan, tgraf, jeff, netdev, kaber, sri
In-Reply-To: <20070808215501.4825d74b@oldman>

On Wed, 2007-08-08 at 21:55 +0100, Stephen Hemminger wrote:

> > pktgen shows a clear win if you test the driver path - which is what you
> > should test because thats where the batching changes are. 
> > Using TCP or UDP adds other variables[1] that need to be isolated first
> > in order to quantify the effect of batching. 
> > For throughput and CPU utilization, the benefit will be clear when there
> > are a lot more flows. 
> 
> Optimizing for pktgen is a mistake for most users. 

There is no "optimization for pktgen". 

If you are improving the tx path, the first step is to test that you can
show the tx path improved. pktgen happens to be the best test suite for
that because it talks to the driver and exercises the changes.
i.e if one cant show that exercising the direct path demonstrates
improvements you probably wont be able to show batching improves TCP -
but i dont even wanna swear by that. 
Does that make sense?

> Please show something
> useful like router forwarding, TCP (single and multi flow) and/or better
> yet application benchmark improvement.

Absolutely, but first things first. Analysis of why something improves
is extremely important, just saying "TCP throughput improved" is not
interesting and lazy. 
To be scientific, it is important to isolate variables first in order to
come up with meaningful results that can be analysed. 
To make a point, I have noticed extremely different results between TCP
BIC vs reno with batching. So congestion control as a variable is
important. 

cheers,
jamal

^ permalink raw reply

* Re: [patch] ipvs: force read of atomic_t in while loop
From: Chris Snook @ 2007-08-08 22:38 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Andrew Morton, Linus Torvalds, Andi Kleen, andi, David Miller,
	linux-kernel, netdev, schwidefsky, wensong, horms
In-Reply-To: <20070808222703.GA11359@osiris.ibm.com>

Heiko Carstens wrote:
> On Wed, Aug 08, 2007 at 02:31:15PM -0700, Andrew Morton wrote:
>> On Wed, 08 Aug 2007 17:08:44 -0400
>> Chris Snook <csnook@redhat.com> wrote:
>>
>>> Heiko Carstens wrote:
>>>> On Wed, Aug 08, 2007 at 03:21:31AM -0700, David Miller wrote:
>>>>> From: Heiko Carstens <heiko.carstens@de.ibm.com>
>>>>> Date: Wed, 8 Aug 2007 11:33:00 +0200
>>>>>
>>>>>> Just saw this while grepping for atomic_reads in a while loops.
>>>>>> Maybe we should re-add the volatile to atomic_t. Not sure.
>>>>> I think whatever the choice, it should be done consistently
>>>>> on every architecture.
>>>>>
>>>>> It's just asking for trouble if your arch does it differently from
>>>>> every other.
>>>> Well..currently it's i386/x86_64 and s390 which have no volatile
>>>> in atomic_t. And yes, of course I agree it should be consistent
>>>> across all architectures. But it isn't.
>>> Based on recent discussion, it's pretty clear that there's a lot of 
>>> confusion about this.  A lot of people (myself included, until I thought 
>>> about it long and hard) will reasonably assume that calling 
>>> atomic_read() will actually read the value from memory.  Leaving out the 
>>> volatile declaration seems like a pessimization to me.  If you force 
>>> people to use barrier() everywhere they're working with atomic_t, it 
>>> will force re-reads of all the non-atomic data in use as well, which 
>>> will cause more memory fetches of things that generally don't need 
>>> barrier().  That and it's a bug waiting to happen.
>>>
>>> Andi -- your thoughts on the matter?
>> I'm not Andi, but this not-Andi thinks that permitting the compiler to cache
>> the results of atomic_read() is dumb.
> 
> Ok, how about this:
> 
> Subject: [PATCH] Add 'volatile' to atomic_t again.
> 
> From: Heiko Carstens <heiko.carstens@de.ibm.com>
> 
> This basically reverts f9e9dcb38f5106fa8cdac04a9e967d5487f1cd20 which
> removed 'volatile' from atomic_t for i386/x86_64. Reason for this
> is to make sure that code like
> while (atomic_read(&whatever));
> continues to work.
> Otherwise the compiler might generate code that will loop forever.
> Also this makes sure atomic_t is the same across all architectures.
> 
> Cc: Andi Kleen <ak@suse.de>
> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
> ---
> 
> s390 patch will go in via Martin if this is accepted.
> 
>  include/asm-i386/atomic.h   |    2 +-
>  include/asm-x86_64/atomic.h |    2 +-
>  3 files changed, 4 insertions(+), 4 deletions(-)
> 
> Index: linux-2.6/include/asm-i386/atomic.h
> ===================================================================
> --- linux-2.6.orig/include/asm-i386/atomic.h
> +++ linux-2.6/include/asm-i386/atomic.h
> @@ -15,7 +15,7 @@
>   * on us. We need to use _exactly_ the address the user gave us,
>   * not some alias that contains the same information.
>   */
> -typedef struct { int counter; } atomic_t;
> +typedef struct { volatile int counter; } atomic_t;
>  
>  #define ATOMIC_INIT(i)	{ (i) }
>  
> Index: linux-2.6/include/asm-x86_64/atomic.h
> ===================================================================
> --- linux-2.6.orig/include/asm-x86_64/atomic.h
> +++ linux-2.6/include/asm-x86_64/atomic.h
> @@ -22,7 +22,7 @@
>   * on us. We need to use _exactly_ the address the user gave us,
>   * not some alias that contains the same information.
>   */
> -typedef struct { int counter; } atomic_t;
> +typedef struct { volatile int counter; } atomic_t;
>  
>  #define ATOMIC_INIT(i)	{ (i) }
>  

Good so far, but we need to fix it on non-SMP architectures too, since 
drivers may use atomic_t in interrupt code.  Ideally I'd like to be able 
to remove a whole bunch of barriers, since they cause a lot of needless 
re-fetches for everything else in the loop.  We should also document the 
semantics of atomic_t to ensure consistency in the future.

	-- Chris

^ permalink raw reply

* Re: [patch] ipvs: force read of atomic_t in while loop
From: Heiko Carstens @ 2007-08-08 22:27 UTC (permalink / raw)
  To: Andrew Morton, Linus Torvalds, Andi Kleen
  Cc: Chris Snook, andi, David Miller, linux-kernel, netdev,
	schwidefsky, wensong, horms
In-Reply-To: <20070808143115.de539511.akpm@linux-foundation.org>

On Wed, Aug 08, 2007 at 02:31:15PM -0700, Andrew Morton wrote:
> On Wed, 08 Aug 2007 17:08:44 -0400
> Chris Snook <csnook@redhat.com> wrote:
> 
> > Heiko Carstens wrote:
> > > On Wed, Aug 08, 2007 at 03:21:31AM -0700, David Miller wrote:
> > >> From: Heiko Carstens <heiko.carstens@de.ibm.com>
> > >> Date: Wed, 8 Aug 2007 11:33:00 +0200
> > >>
> > >>> Just saw this while grepping for atomic_reads in a while loops.
> > >>> Maybe we should re-add the volatile to atomic_t. Not sure.
> > >> I think whatever the choice, it should be done consistently
> > >> on every architecture.
> > >>
> > >> It's just asking for trouble if your arch does it differently from
> > >> every other.
> > > 
> > > Well..currently it's i386/x86_64 and s390 which have no volatile
> > > in atomic_t. And yes, of course I agree it should be consistent
> > > across all architectures. But it isn't.
> > 
> > Based on recent discussion, it's pretty clear that there's a lot of 
> > confusion about this.  A lot of people (myself included, until I thought 
> > about it long and hard) will reasonably assume that calling 
> > atomic_read() will actually read the value from memory.  Leaving out the 
> > volatile declaration seems like a pessimization to me.  If you force 
> > people to use barrier() everywhere they're working with atomic_t, it 
> > will force re-reads of all the non-atomic data in use as well, which 
> > will cause more memory fetches of things that generally don't need 
> > barrier().  That and it's a bug waiting to happen.
> > 
> > Andi -- your thoughts on the matter?
> 
> I'm not Andi, but this not-Andi thinks that permitting the compiler to cache
> the results of atomic_read() is dumb.

Ok, how about this:

Subject: [PATCH] Add 'volatile' to atomic_t again.

From: Heiko Carstens <heiko.carstens@de.ibm.com>

This basically reverts f9e9dcb38f5106fa8cdac04a9e967d5487f1cd20 which
removed 'volatile' from atomic_t for i386/x86_64. Reason for this
is to make sure that code like
while (atomic_read(&whatever));
continues to work.
Otherwise the compiler might generate code that will loop forever.
Also this makes sure atomic_t is the same across all architectures.

Cc: Andi Kleen <ak@suse.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
---

s390 patch will go in via Martin if this is accepted.

 include/asm-i386/atomic.h   |    2 +-
 include/asm-x86_64/atomic.h |    2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

Index: linux-2.6/include/asm-i386/atomic.h
===================================================================
--- linux-2.6.orig/include/asm-i386/atomic.h
+++ linux-2.6/include/asm-i386/atomic.h
@@ -15,7 +15,7 @@
  * on us. We need to use _exactly_ the address the user gave us,
  * not some alias that contains the same information.
  */
-typedef struct { int counter; } atomic_t;
+typedef struct { volatile int counter; } atomic_t;
 
 #define ATOMIC_INIT(i)	{ (i) }
 
Index: linux-2.6/include/asm-x86_64/atomic.h
===================================================================
--- linux-2.6.orig/include/asm-x86_64/atomic.h
+++ linux-2.6/include/asm-x86_64/atomic.h
@@ -22,7 +22,7 @@
  * on us. We need to use _exactly_ the address the user gave us,
  * not some alias that contains the same information.
  */
-typedef struct { int counter; } atomic_t;
+typedef struct { volatile int counter; } atomic_t;
 
 #define ATOMIC_INIT(i)	{ (i) }
 

^ permalink raw reply

* [ofa-general] Re: [PATCH 0/9 Rev3] Implement batching skb API and support in IPoIB
From: David Miller @ 2007-08-08 22:22 UTC (permalink / raw)
  To: hadi
  Cc: johnpol, peter.p.waskiewicz.jr, herbert, gaagaan, Robert.Olsson,
	kumarkr, rdreier, mcarlson, netdev, jagana, general, mchan, tgraf,
	jeff, shemminger, kaber, sri
In-Reply-To: <1186586075.5155.27.camel@localhost>

From: jamal <hadi@cyberus.ca>
Date: Wed, 08 Aug 2007 11:14:35 -0400

> pktgen shows a clear win if you test the driver path - which is what
> you should test because thats where the batching changes are.

The driver path, however, does not exist on an island and what
we care about is the final result with the changes running
inside the full system.

So, to be honest, besides for initial internal development
feedback, the isolated tests only have minimal merit and
it's the full protocol tests that are really interesting.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox