[PATCH] tg3_msi() and weakly ordered memory

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] tg3_msi() and weakly ordered memory
@ 2005-06-14  3:37 Grant Grundler
  0 siblings, 0 replies; 8+ messages in thread
From: Grant Grundler @ 2005-06-14  3:37 UTC (permalink / raw)
  To: David S. Miller; +Cc: mchan, netdev, iod00d

Dave,
I reviewed the "new" (to me) use of tags and MSI in tg3_msi() and
tg3_poll() and I like the new scheme. It's pretty clean.

But I did come up with four potential "issues" - mostly revolving
around enforcing order of memory access on weakly ordered platforms:

1) tg3_poll() and tg3_msi() are not consistent with use of rmb().
   tg3_poll has an rmb() between reading status_tag and tg3_has_work().
   The patch (against tg3 v3.29) below adds a similar rmb() to tg3_msi().

   Does tg3_msi() need a "rmb()" like in the attached patch?
   Or rather a mb() to deal with clearing SD_STATUS_UPDATED bit?

2) tg3_poll() and tg3_msi() are not consistent on how they clear
   the SD_STATUS_UPDATED bit. tg3_poll() does not clear SD_STATUS_UPDATED
   bit after reading status_tag. I think everytime the driver discovers
   the status_tag changed, it should to clear SD_STATUS_UPDATED.
   Michael, can you confirm/deny that offhand?

   I'm not sure anymore what order the sblk fields (status_tag, tx_consumer,
   and rx_producer) should be read before clearing SD_STATUS_UPDATED bit.
   I expect a recommended order exists.
   ISTR something like:
	read status_tag
	rmb()
	read tx_consumer and rx_producer
	mb()
	clear SD_STATUS_UPDATED

3) Based on the above sequence, tg3 might need one more rmb() between
   reading sblk status_tag and the inline code for tg3_has_work(). 

4) I'd also prefer if tg3 would read tx_consumer/rx_producer fields
   *only* in tg3_msi() and tg3_poll() when sblk status_tag is read.
   All other references (e.g. tg3_has_work(), tg3_rx(), etc) would use
   a cached copy of those fields.
   My goal would be to reduce the competition for access to sblk
   cacheline and get the memory ordering issues right.
   My fear is regularly reading the cacheline by the CPU will take
   away exclusive (write) access from the IO subsystem and ping-pong
   the cacheline more often than necessary.
   Would you entertain a patch for this?

thanks,
grant

Signed-off-by: Grant Grundler <iodood@hp.com>

--- a/drivers/net/tg3.c	25 May 2005 17:12:47 -0000	1.35
+++ b/drivers/net/tg3.c	14 Jun 2005 01:37:43 -0000
@@ -2946,6 +2946,7 @@ static irqreturn_t tg3_msi(int irq, void
 	 */
 	tw32_mailbox(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW, 0x00000001);
 	tp->last_tag = sblk->status_tag;
+	rmb():
 	sblk->status &= ~SD_STATUS_UPDATED;
 	if (likely(tg3_has_work(tp)))
 		netif_rx_schedule(dev);		/* schedule NAPI poll */

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] tg3_msi() and weakly ordered memory
       [not found] <B1508D50A0692F42B217C22C02D84972067F0804@NT-IRVA-0741.brcm.ad.broadcom.com>
@ 2005-06-14 15:40 ` Grant Grundler
       [not found]   ` <1118767397.7059.19.camel@rh4>
  2005-06-14 17:55 ` Grant Grundler
  1 sibling, 1 reply; 8+ messages in thread
From: Grant Grundler @ 2005-06-14 15:40 UTC (permalink / raw)
  To: Michael Chan; +Cc: Grant Grundler, David S. Miller, netdev

On Mon, Jun 13, 2005 at 11:46:47PM -0700, Michael Chan wrote:
> Yes, you're right. rmb() is needed between reading the tag and tg3_has_work()
> to guarantee strict ordering. Otherwise, tg3_has_work() may get ahead and
> read stale information that may be older than the tag.

Ok - thanks for confirming. I just wasn't sure any more.
It's been a year or so since I looked at this last time.

> But the clearing of
> the SD_STATUS_UPDATED bit does not need any additional barriers.
...
> You're right again. The SD_STATUS_UPDATED bit should be cleared right before
> checking for new work. Clearing the SD_STATUS_UPDATED bit tells the non-msi
> irq handler that all work up to the last status block update has been
> processed.
...
> The clearing of the SD_STATUS_UPDATED bit does not have to follow very strict
> ordering for the following reasons:
> 
> 1. It has no hardware significance. It is purely to tell the irq handler that
> the current status block has been processed. For MSI, since we don't even
> check that bit in tg3_msi(), we can skip clearing that bit. But I think it is
> safer to clear it because tg3_cond_int() is checking it.

Ok - I thought the NIC was reading that back for some reason.
If we can ignore SD_STATUS_UPDATED and use a flag not related
to sblk, I think it would be cacheline friendlier. But it's
a minor issue.

> 2. We only clear it when interrupt from the NIC is disabled, either in irq
> handler or tg3_poll(). So there is no potential contention.
> 
> So the current sequence is fine.

ok - thank you for the clarification.

> It is important to read the actual status block with the latest indices to
> determine whether there is new work, especially in the non-tagged case where
> you may have race condition between software and hardware.

Certainly...my point was we should not read them on every iteration
of the RX or TX loop that processes packets - but rather outside that loop.
And I'd think we want to read the three values (status tag, tx_consumer,
and rx_producer) as a set - ie read them and process stuff until
we've exhausted the "work quota" or the driver has caught up
to the HW (status tag stops changing).

I don't see a problem with exiting tg3_poll despite more work
pending if we know we are going to catch it on the next round.
After all, we are polling - latency is going to be semi random
depending on where we are in the sequence anyway.

thanks,
grant

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] tg3_msi() and weakly ordered memory
       [not found] <B1508D50A0692F42B217C22C02D84972067F0805@NT-IRVA-0741.brcm.ad.broadcom.com>
@ 2005-06-14 15:46 ` Grant Grundler
       [not found]   ` <1118771563.7059.30.camel@rh4>
  0 siblings, 1 reply; 8+ messages in thread
From: Grant Grundler @ 2005-06-14 15:46 UTC (permalink / raw)
  To: Michael Chan; +Cc: David S. Miller, iod00d, netdev

On Mon, Jun 13, 2005 at 11:54:23PM -0700, Michael Chan wrote:
> > Once you write "0x1" to the mailbox register, the device stops
> > updating the status block and stops generating interrupts.
> > 
> > That is what makes a lot of things safe.
> 
> Only interrupts are stopped, status block will still be updated subject to
> during-ints coalescing.

Will setting during-ints to a very high threshhold essentially allow
us to "indefinitely" process stuff without taking any interrupts?
Would the threshhold counter get reset every time we write back
the status tag WITHOUT re-enableing interrupts?

If not, I suspect the CPU will circulate in tg3_poll until during-ints
is exhausted and DMA will stop until CPU reenables interrupts.
ie not until it's done processing outstanding packets.

thanks,
grant

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] tg3_msi() and weakly ordered memory
       [not found] <B1508D50A0692F42B217C22C02D84972067F0804@NT-IRVA-0741.brcm.ad.broadcom.com>
  2005-06-14 15:40 ` [PATCH] tg3_msi() and weakly ordered memory Grant Grundler
@ 2005-06-14 17:55 ` Grant Grundler
  1 sibling, 0 replies; 8+ messages in thread
From: Grant Grundler @ 2005-06-14 17:55 UTC (permalink / raw)
  To: Michael Chan; +Cc: Grant Grundler, David S. Miller, netdev

On Mon, Jun 13, 2005 at 11:46:47PM -0700, Michael Chan wrote:
> rmb() is needed between reading the tag and tg3_has_work()
> to guarantee strict ordering.

Thinking about this more... tg3_has_work() could be reduced to 
comparing status tag with last_tag (vs each of the TX/RX indices).
That assumes all the tg3 NICs support status tags...if not, then we
have to keep checking indices.

[ BTW, I noticed spin_lock*(&tp->lock) calls are nested in tg3_poll.
That's a bug, right? I'm still looking at v3.29 ]

The current implementation of tg3_poll() processes TX and then RX.
The status tag we read afterwards and the TX/RX indices checked 
could be newer than the TX/RX indices used during processing.
Is tg3 then roughly rate limited to the TX and RX queue depth
per poll interval?
(I'm still thinking during-ints limits how much DMA can occur)

Given TG3_TX_RING_SIZE is 512, then I would max out at ~500Kpps
if there is any RX traffic that causes tg3_has_work() to come
back true. While this might be normally ok, I'm looking to
maximize pktgen output w/o disabling/enabling interrupts
for each "batch" of TX packets.

> > 2) tg3_poll() and tg3_msi() are not consistent on how they clear
> >    the SD_STATUS_UPDATED bit. tg3_poll() does not clear SD_STATUS_UPDATED
> >    bit after reading status_tag. I think everytime the driver discovers
> >    the status_tag changed, it should to clear SD_STATUS_UPDATED.
> >    Michael, can you confirm/deny that offhand?
> 
> You're right again. The SD_STATUS_UPDATED bit should be cleared right before
> checking for new work. Clearing the SD_STATUS_UPDATED bit tells the non-msi
> irq handler that all work up to the last status block update has been
> processed.

If I understood this correctly, tg3 may already have new work pending
when tg3_has_work() is called from tg3_poll().  tg3_poll() does not tell
the card anything but promises to pick up where it left off the
next time tg3_poll() is called. If we don't tell the card anything,
it means at some point it's going to stop doing DMA....this might be
one of the things preventing tg3 from doing link rate with pktgen
pushing 64byte packets.

...
> It is important to read the actual status block with the latest indices to
> determine whether there is new work, especially in the non-tagged case where
> you may have race condition between software and hardware.

Yes - I think I understand were several of the races can occur. 
Probably not seeing all of them though.

thanks again,
grant

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] tg3_msi() and weakly ordered memory
       [not found]   ` <1118767397.7059.19.camel@rh4>
@ 2005-06-14 18:04     ` Grant Grundler
  0 siblings, 0 replies; 8+ messages in thread
From: Grant Grundler @ 2005-06-14 18:04 UTC (permalink / raw)
  To: Michael Chan; +Cc: Grant Grundler, David S. Miller, netdev

On Tue, Jun 14, 2005 at 09:43:17AM -0700, Michael Chan wrote:
...
> Something like:
> 
> if (sblk->status_tag != tp->last_tag)
> 	clear_interrupt();
> 	netif_rx_schedule();
> 
> This way we don't have to clear the SD_STATUS_UPDATED bit. I will
> experiment with this and see if it works well.

that sounds good - thanks.

> I don't think we are reading the index on every iteration. In tg3_rx(),
> we read it at the beginning before the loop, and one more time if we
> have caught up with the hw index before exiting the loop.

oh - sorry - my bad. Same is true for tg3_tx().

And I just noticed I'm smoking crack on "nested locks" too...
one is "lock" and the other is "tx_lock".
*sigh* - need more sleep.

> I mildly disagree. I think we should maximize the amount of work done in
> tg3_poll(). For example, reading the rx_producer index one more time
> when we have caught up with hw index before exiting the loop is a good
> thing IMO.

ok.

thanks,
grant

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] tg3_msi() and weakly ordered memory
       [not found]     ` <20050614211530.GB25516@esmail.cup.hp.com>
@ 2005-06-21 23:56       ` David S. Miller
  2005-06-22  5:20         ` Grant Grundler
  2005-06-22 12:56         ` [PATCH] dont use strlen() but the result from a prior sprintf() Eric Dumazet
  0 siblings, 2 replies; 8+ messages in thread
From: David S. Miller @ 2005-06-21 23:56 UTC (permalink / raw)
  To: iod00d; +Cc: mchan, netdev

Ok, here is the patch I came up with as a result of this thread.

Michael stated he would investigate using a pure tag comparison in
place of tg3_has_work() when the chip is using tagged interrupts.

Thanks.

[TG3]: Fix missing memory barriers and SD_STATUS_UPDATED bit clearing.

There must be a rmb() between reading the status block tag
and calling tg3_has_work().  This was missing in tg3_mis()
and tg3_interrupt_tagged().  tg3_poll() got it right.

Also, SD_STATUS_UPDATED must be cleared in the status block
right before we call tg3_has_work().  Only tg3_poll() got this
wrong.

Based upon patches and commentary from Grant Grundler and
Michael Chan.

Signed-off-by: David S. Miller <davem@davemloft.net>

--- 1/drivers/net/tg3.c.~1~	2005-06-21 16:39:19.000000000 -0700
+++ 2/drivers/net/tg3.c	2005-06-21 16:47:55.000000000 -0700
@@ -2929,6 +2929,7 @@
 	if (tp->tg3_flags & TG3_FLAG_TAGGED_STATUS)
 		tp->last_tag = sblk->status_tag;
 	rmb();
+	sblk->status &= ~SD_STATUS_UPDATED;

 	/* if no more work, tell net stack and NIC we're done */
 	done = !tg3_has_work(tp);
@@ -2964,6 +2965,7 @@
 	 */
 	tw32_mailbox(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW, 0x00000001);
 	tp->last_tag = sblk->status_tag;
+	rmb();
 	sblk->status &= ~SD_STATUS_UPDATED;
 	if (likely(tg3_has_work(tp)))
 		netif_rx_schedule(dev);		/* schedule NAPI poll */
@@ -3051,6 +3053,7 @@
 		tw32_mailbox(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW,
 			     0x00000001);
 		tp->last_tag = sblk->status_tag;
+		rmb();
 		sblk->status &= ~SD_STATUS_UPDATED;
 		if (likely(tg3_has_work(tp)))
 			netif_rx_schedule(dev);		/* schedule NAPI poll */

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] tg3_msi() and weakly ordered memory
  2005-06-21 23:56       ` David S. Miller
@ 2005-06-22  5:20         ` Grant Grundler
  2005-06-22 12:56         ` [PATCH] dont use strlen() but the result from a prior sprintf() Eric Dumazet
  1 sibling, 0 replies; 8+ messages in thread
From: Grant Grundler @ 2005-06-22  5:20 UTC (permalink / raw)
  To: David S. Miller; +Cc: iod00d, mchan, netdev

On Tue, Jun 21, 2005 at 04:56:34PM -0700, David S. Miller wrote:
> 
> Ok, here is the patch I came up with as a result of this thread.

looks good to me.

> Michael stated he would investigate using a pure tag comparison in
> place of tg3_has_work() when the chip is using tagged interrupts.

The more I think about it, the more I like the idea of each ISR calling
into a different tg3_poll routine. The specific _poll() routine could
do the "is there more work" checking instead the TX/RX ring
cleanup code. The main reason is the "more work" checks can
be better optimized for MSI (use tags) vs IRQ Line interrupt (use ring
indices) handlers. I also hope to reduce cacheline movement by touching
the status block fewer times.

This isn't a trivial patch and I'm short on time (preparing stuff
for OLS and HP World before my vacation). If there is still interest,
I can prototype a patch in late August or Sept (about 8 weeks from now).

> Thanks.

Welcome and thanks too.

BTW, I greatly appreciate Michael clarifying tg3 behavior.

thanks,
grant

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] dont use strlen() but the result from a prior sprintf()
  2005-06-21 23:56       ` David S. Miller
  2005-06-22  5:20         ` Grant Grundler
@ 2005-06-22 12:56         ` Eric Dumazet
  1 sibling, 0 replies; 8+ messages in thread
From: Eric Dumazet @ 2005-06-22 12:56 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev

[-- Attachment #1: Type: text/plain, Size: 769 bytes --]

Hi David

Small patch to save an unecessary call to strlen() : sprintf() gave us the length, just trust it.

Thank you

Eric Dumazet


diff -Nu linux-2.6.12-orig/net/socket.c linux-2.6.12/net/socket.c
--- linux-2.6.12-orig/net/socket.c      2005-06-22 14:47:56.000000000 +0200
+++ linux-2.6.12/net/socket.c   2005-06-22 14:49:22.000000000 +0200
@@ -382,9 +382,8 @@
                         goto out;
                 }

-               sprintf(name, "[%lu]", SOCK_INODE(sock)->i_ino);
+               this.len = sprintf(name, "[%lu]", SOCK_INODE(sock)->i_ino);
                 this.name = name;
-               this.len = strlen(name);
                 this.hash = SOCK_INODE(sock)->i_ino;

                 file->f_dentry = d_alloc(sock_mnt->mnt_sb->s_root, &this);


[-- Attachment #2: patch.1 --]
[-- Type: text/plain, Size: 446 bytes --]

--- linux-2.6.12-orig/net/socket.c	2005-06-22 14:47:56.000000000 +0200
+++ linux-2.6.12/net/socket.c	2005-06-22 14:49:22.000000000 +0200
@@ -382,9 +382,8 @@
 			goto out;
 		}
 
-		sprintf(name, "[%lu]", SOCK_INODE(sock)->i_ino);
+		this.len = sprintf(name, "[%lu]", SOCK_INODE(sock)->i_ino);
 		this.name = name;
-		this.len = strlen(name);
 		this.hash = SOCK_INODE(sock)->i_ino;
 
 		file->f_dentry = d_alloc(sock_mnt->mnt_sb->s_root, &this);

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2005-06-22 12:56 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <B1508D50A0692F42B217C22C02D84972067F0805@NT-IRVA-0741.brcm.ad.broadcom.com>
2005-06-14 15:46 ` [PATCH] tg3_msi() and weakly ordered memory Grant Grundler
     [not found]   ` <1118771563.7059.30.camel@rh4>
     [not found]     ` <20050614211530.GB25516@esmail.cup.hp.com>
2005-06-21 23:56       ` David S. Miller
2005-06-22  5:20         ` Grant Grundler
2005-06-22 12:56         ` [PATCH] dont use strlen() but the result from a prior sprintf() Eric Dumazet
     [not found] <B1508D50A0692F42B217C22C02D84972067F0804@NT-IRVA-0741.brcm.ad.broadcom.com>
2005-06-14 15:40 ` [PATCH] tg3_msi() and weakly ordered memory Grant Grundler
     [not found]   ` <1118767397.7059.19.camel@rh4>
2005-06-14 18:04     ` Grant Grundler
2005-06-14 17:55 ` Grant Grundler
2005-06-14  3:37 Grant Grundler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).