* Re: intermittant petabyte usage reported with broadcom nic [not found] <20070402014319.GA8345@zip.com.au> @ 2007-04-02 7:13 ` Andrew Morton 2007-04-02 7:41 ` CaT 2007-04-12 22:52 ` CaT 0 siblings, 2 replies; 16+ messages in thread From: Andrew Morton @ 2007-04-02 7:13 UTC (permalink / raw) To: CaT; +Cc: linux-kernel, netdev On Mon, 2 Apr 2007 11:43:19 +1000 CaT <cat@zip.com.au> wrote: > I take minute by minute snapshots of network traffic by sampling > /proc/net/dev and most of the time everything works fine. Occasionally > though I get petabyte byte traffic and corresponding packet traffic. How frequently? Are you able to provide some actual numbers (expected and actual values), so we can look at the bit patterns? > This happens on an AMD64, dual core smp box with Broadcom NetXtreme II > nics. What driver drivers that? b44.c? > The issue happens with both nics but at different times. The same > sampling code runs on p4 boxes with ht on and e1000 nics without issues > so I don't believe it's an issue with my code (famous last words :) > which just does an re to extract the data on a per-line basis and prints > it out. Still, I'll be adding code to log any big readings and hopefully > it'll happen again sooner rather then later. > > There is no preemption involved and the kernel is a monolythic build of > 2.6.19.[12] (there are two servers). We do perform racy 64-bit updates of some of the stats counters. But that'll only affect 32-bit kernels and I'm assuming you're running a 64-bit kernel on that AMD64 box (are you?) Plus it's odd that both the byte-counters and the packet-counters go wonky at the same time. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: intermittant petabyte usage reported with broadcom nic 2007-04-02 7:13 ` intermittant petabyte usage reported with broadcom nic Andrew Morton @ 2007-04-02 7:41 ` CaT 2007-04-02 10:31 ` Jean-Daniel Pauget 2007-04-15 0:20 ` Michael Chan 2007-04-12 22:52 ` CaT 1 sibling, 2 replies; 16+ messages in thread From: CaT @ 2007-04-02 7:41 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, netdev On Mon, Apr 02, 2007 at 12:13:00AM -0700, Andrew Morton wrote: > On Mon, 2 Apr 2007 11:43:19 +1000 CaT <cat@zip.com.au> wrote: > > > I take minute by minute snapshots of network traffic by sampling > > /proc/net/dev and most of the time everything works fine. Occasionally > > though I get petabyte byte traffic and corresponding packet traffic. > > How frequently? I can count about 6 over the past month. > Are you able to provide some actual numbers (expected and actual values), > so we can look at the bit patterns? I have them in an rrd file. I think though that the numbers will be 'adjusted' to fit in with the timekeeping. The logging code I've added should provide exact numbers as it'll just dump what it reads from /proc into syslog. > > This happens on an AMD64, dual core smp box with Broadcom NetXtreme II > > nics. > > What driver drivers that? b44.c? bnx2 > > The issue happens with both nics but at different times. The same > > sampling code runs on p4 boxes with ht on and e1000 nics without issues > > so I don't believe it's an issue with my code (famous last words :) > > which just does an re to extract the data on a per-line basis and prints > > it out. Still, I'll be adding code to log any big readings and hopefully > > it'll happen again sooner rather then later. > > > > There is no preemption involved and the kernel is a monolythic build of > > 2.6.19.[12] (there are two servers). > > We do perform racy 64-bit updates of some of the stats counters. But > that'll only affect 32-bit kernels and I'm assuming you're running a 64-bit > kernel on that AMD64 box (are you?) Correct. The environment is 64bit clean, though the kernel is compiled with 32bit support so that I can run static 32bit binaries if need be. > Plus it's odd that both the byte-counters and the packet-counters go wonky > at the same time. If you want I can toss you the rrd graphs that result from the data. The values do not appear to be static. For example, the resent 2 hits (within 10 minutes of each other) gave almost 3petabytes and just over 4 petabytes. Interesting is that the incoming data is driven upto petabytes whilst the outgoing data hits megabytes at that point. This is consistant and the server is generally quiet. -- "To the extent that we overreact, we proffer the terrorists the greatest tribute." - High Court Judge Michael Kirby ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: intermittant petabyte usage reported with broadcom nic 2007-04-02 7:41 ` CaT @ 2007-04-02 10:31 ` Jean-Daniel Pauget 2007-04-15 0:20 ` Michael Chan 1 sibling, 0 replies; 16+ messages in thread From: Jean-Daniel Pauget @ 2007-04-02 10:31 UTC (permalink / raw) To: CaT; +Cc: Andrew Morton, linux-kernel, netdev I don't know if a me-too may help you, but I have exactly the same trouble on a whole set of dell servers, all with bmx drivers (suse 10.1 kernel) and values fetched by an homebrew daemon and collected via rrd. > uname -a Linux toronto 2.6.16.27-0.6-smp #1 SMP Wed Dec 13 09:34:50 UTC 2006 x86_64 x86_64 x86_64 GNU/Linux .../... <6>Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v1.4.31 (January 19, 2006) <6>ACPI: PCI Interrupt 0000:09:00.0[A] -> GSI 16 (level, low) -> IRQ 169 <6>usbcore: registered new driver hub <6>eth0: Broadcom NetXtreme II BCM5708 1000Base-T (B2) PCI-X 64-bit 133MHz found at mem f4000000, IRQ 169, node addr 0015c5f18146 <6>ACPI: PCI Interrupt 0000:05:00.0[A] -> GSI 16 (level, low) -> IRQ 169 <6>eth1: Broadcom NetXtreme II BCM5708 1000Base-T (B2) PCI-X 64-bit 133MHz found at mem f8000000, IRQ 169, node addr 0015c5f18144 .../... On Mon, Apr 02, 2007 at 05:41:08PM +1000, CaT wrote: > On Mon, Apr 02, 2007 at 12:13:00AM -0700, Andrew Morton wrote: > > On Mon, 2 Apr 2007 11:43:19 +1000 CaT <cat@zip.com.au> wrote: > > > > > I take minute by minute snapshots of network traffic by sampling > > > /proc/net/dev and most of the time everything works fine. Occasionally > > > though I get petabyte byte traffic and corresponding packet traffic. on my side measures are performed on a 10sec frequency basis > > How frequently? > > I can count about 6 over the past month. almost once a day per machine. > > Are you able to provide some actual numbers (expected and actual > > values), > > so we can look at the bit patterns? I can patch my app in order to give you those exact numbers (I'm afraid not to be an rrd expert to extract real past values reported) on another side, I cannot really test new drivers on those machine just for those tests. > > > This happens on an AMD64, dual core smp box with Broadcom NetXtreme > > > II > > > nics. > > > > What driver drivers that? b44.c? > bnx2 > > > > The issue happens with both nics but at different times. The same > > > sampling code runs on p4 boxes with ht on and e1000 nics without issues > > > so I don't believe it's an issue with my code (famous last words :) exactly the same, just xeons instead of AMD. -- Jean-Daniel Pauget Tél: +33 (0)2 33 17 20 16 2, rue André PELCA 50580 Denneville-Plage France ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: intermittant petabyte usage reported with broadcom nic 2007-04-02 7:41 ` CaT 2007-04-02 10:31 ` Jean-Daniel Pauget @ 2007-04-15 0:20 ` Michael Chan 2007-04-16 19:10 ` Michael Chan 2007-05-22 1:15 ` Michael Chan 1 sibling, 2 replies; 16+ messages in thread From: Michael Chan @ 2007-04-15 0:20 UTC (permalink / raw) To: CaT; +Cc: Andrew Morton, linux-kernel, netdev On Mon, 2007-04-02 at 17:41 +1000, CaT wrote: > On Mon, Apr 02, 2007 at 12:13:00AM -0700, Andrew Morton wrote: > > On Mon, 2 Apr 2007 11:43:19 +1000 CaT <cat@zip.com.au> wrote: > > > > > I take minute by minute snapshots of network traffic by sampling > > > /proc/net/dev and most of the time everything works fine. Occasionally > > > though I get petabyte byte traffic and corresponding packet traffic. > > > > How frequently? > > I can count about 6 over the past month. > I did a quick test on a 64-bit kernel and did not see any problem with the counters. I'll ask the lab to set up a longer term test and monitor the counters for bogus values. I also like Andi's idea of using change_page_attr() to isolate the problem. I'll try to send you a debug patch in the next few days to try that out. Thanks. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: intermittant petabyte usage reported with broadcom nic 2007-04-15 0:20 ` Michael Chan @ 2007-04-16 19:10 ` Michael Chan 2007-04-16 23:43 ` CaT 2007-05-22 1:15 ` Michael Chan 1 sibling, 1 reply; 16+ messages in thread From: Michael Chan @ 2007-04-16 19:10 UTC (permalink / raw) To: CaT; +Cc: Andrew Morton, linux-kernel, netdev On Sat, 2007-04-14 at 17:20 -0700, Michael Chan wrote: > I also like Andi's idea of using change_page_attr() to isolate the > problem. I'll try to send you a debug patch in the next few days to try > that out. Thanks. > Here's the debug patch for x86 only that will change the statistics memory block to read-only. If the kernel is corrupting it, you should get a page fault that will crash the system. If you continue to see bogus counters, it is definitely a firmware or hardware problem. Please try it and let me know. Thanks. diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c index 0b7aded..b7d491b 100644 --- a/drivers/net/bnx2.c +++ b/drivers/net/bnx2.c @@ -47,6 +47,7 @@ #include <linux/prefetch.h> #include <linux/cache.h> #include <linux/zlib.h> +#include <asm/cacheflush.h> #include "bnx2.h" #include "bnx2_fw.h" @@ -436,6 +437,8 @@ bnx2_free_mem(struct bnx2 *bp) } } if (bp->status_blk) { + change_page_attr(virt_to_page(bp->status_blk), 1, PAGE_KERNEL); + global_flush_tlb(); pci_free_consistent(bp->pdev, bp->status_stats_size, bp->status_blk, bp->status_blk_mapping); bp->status_blk = NULL; @@ -501,6 +504,7 @@ bnx2_alloc_mem(struct bnx2 *bp) bp->status_stats_size = status_blk_size + sizeof(struct statistics_block); + bp->status_stats_size = PAGE_SIZE; bp->status_blk = pci_alloc_consistent(bp->pdev, bp->status_stats_size, &bp->status_blk_mapping); if (bp->status_blk == NULL) @@ -508,6 +512,10 @@ bnx2_alloc_mem(struct bnx2 *bp) memset(bp->status_blk, 0, bp->status_stats_size); + /* x86 debug code to see if the kernel is corrupting the statistics */ + change_page_attr(virt_to_page(bp->status_blk), 1, PAGE_KERNEL_RO); + global_flush_tlb(); + bp->stats_blk = (void *) ((unsigned long) bp->status_blk + status_blk_size); @@ -4307,7 +4315,9 @@ bnx2_timer(unsigned long data) msg = (u32) ++bp->fw_drv_pulse_wr_seq; REG_WR_IND(bp, bp->shmem_base + BNX2_DRV_PULSE_MB, msg); +#if 0 bp->stats_blk->stat_FwRxDrop = REG_RD_IND(bp, BNX2_FW_RX_DROP_COUNT); +#endif if (bp->phy_flags & PHY_SERDES_FLAG) { if (CHIP_NUM(bp) == CHIP_NUM_5706) ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: intermittant petabyte usage reported with broadcom nic 2007-04-16 19:10 ` Michael Chan @ 2007-04-16 23:43 ` CaT 2007-04-17 12:01 ` Jean-Daniel Pauget 2007-04-17 15:58 ` Roland Dreier 0 siblings, 2 replies; 16+ messages in thread From: CaT @ 2007-04-16 23:43 UTC (permalink / raw) To: Michael Chan; +Cc: Andrew Morton, linux-kernel, netdev, Jean-Daniel Pauget On Mon, Apr 16, 2007 at 12:10:51PM -0700, Michael Chan wrote: > On Sat, 2007-04-14 at 17:20 -0700, Michael Chan wrote: > > > I also like Andi's idea of using change_page_attr() to isolate the > > problem. I'll try to send you a debug patch in the next few days to try > > that out. Thanks. > > Here's the debug patch for x86 only that will change the statistics > memory block to read-only. If the kernel is corrupting it, you should > get a page fault that will crash the system. If you continue to see > bogus counters, it is definitely a firmware or hardware problem. Please > try it and let me know. Thanks. Ahh. Would truly love to but the moment you said 'crash the system' I had to bail. These boxes are in production and as such a crash would be, shall we say, unwelcome. I might be able to fenagle something but I very-much doubt it. Perhaps Jean-Daniel, who is also experiencing this problem and seemingly more frequently then I, has a box that he could run your patch on. I think we both run pretty-much the same hardware (Dell [12]950s). I've CCed him. -- "To the extent that we overreact, we proffer the terrorists the greatest tribute." - High Court Judge Michael Kirby ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: intermittant petabyte usage reported with broadcom nic 2007-04-16 23:43 ` CaT @ 2007-04-17 12:01 ` Jean-Daniel Pauget 2007-04-17 15:58 ` Roland Dreier 1 sibling, 0 replies; 16+ messages in thread From: Jean-Daniel Pauget @ 2007-04-17 12:01 UTC (permalink / raw) To: CaT; +Cc: Michael Chan, Andrew Morton, linux-kernel, netdev On Tue, Apr 17, 2007 at 09:43:48AM +1000, CaT wrote: > On Mon, Apr 16, 2007 at 12:10:51PM -0700, Michael Chan wrote: > > On Sat, 2007-04-14 at 17:20 -0700, Michael Chan wrote: > > > > Here's the debug patch for x86 only that will change the statistics > > memory block to read-only. If the kernel is corrupting it, you should > > get a page fault that will crash the system. If you continue to see > > bogus counters, it is definitely a firmware or hardware problem. Please > > try it and let me know. Thanks. [.../...] > Perhaps Jean-Daniel, who is also experiencing this problem and seemingly > more frequently then I, has a box that he could run your patch on. I > think we both run pretty-much the same hardware (Dell [12]950s). Dell 1950/2950 indeed... if there is any way to catch that writing without crashing the system (even to the price of some slowness) I can test it. if not, I can't because all my available targets are remote administrated and involved with production processes. if luckilly one of them gets free, I'll to apply the latest patch you'd provide me. I may also try it one day I'm close to those machines, so keep me in the list of up to date patches. -- Jean-Daniel Pauget Tél: +33 (0)2 33 17 20 16 2, rue André PELCA 50580 Denneville-Plage France ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: intermittant petabyte usage reported with broadcom nic 2007-04-16 23:43 ` CaT 2007-04-17 12:01 ` Jean-Daniel Pauget @ 2007-04-17 15:58 ` Roland Dreier 1 sibling, 0 replies; 16+ messages in thread From: Roland Dreier @ 2007-04-17 15:58 UTC (permalink / raw) To: CaT; +Cc: Michael Chan, Andrew Morton, linux-kernel, netdev, Jean-Daniel Pauget I actually have a couple of Dell 1950 systems with bnx2 NICs too, which I use for kernel development (ie one more crash is fine :) If someone can give me an idea for what kind of load to use, I can try this patch out to see if it triggers. - R. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: intermittant petabyte usage reported with broadcom nic 2007-04-15 0:20 ` Michael Chan 2007-04-16 19:10 ` Michael Chan @ 2007-05-22 1:15 ` Michael Chan 1 sibling, 0 replies; 16+ messages in thread From: Michael Chan @ 2007-05-22 1:15 UTC (permalink / raw) To: CaT, jd; +Cc: Andrew Morton, linux-kernel, netdev On Mon, 2 Apr 2007 11:43:19 +1000 CaT <cat@zip.com.au> wrote: > > I take minute by minute snapshots of network traffic by sampling > /proc/net/dev and most of the time everything works fine. Occasionally > though I get petabyte byte traffic and corresponding packet traffic. We were able to reproduce the problem and confirmed that it was a DMA problem of the statistics block. About once an hour on average, wrong counter values will be DMA'ed to host memory. Luckily, the DMA write stays within the intended address range so it will not corrupt other parts of memory. Other types of DMA including traffic and buffer descriptors are not affected. If you happen to be reading /proc/net/dev within a second after the DMA corruption, you'll see bogus counters. One second later and until the next bad DMA, the counters will be normal again. We are considering ways to workaround the problem. Thanks. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: intermittant petabyte usage reported with broadcom nic 2007-04-02 7:13 ` intermittant petabyte usage reported with broadcom nic Andrew Morton 2007-04-02 7:41 ` CaT @ 2007-04-12 22:52 ` CaT 2007-04-12 23:13 ` Andrew Morton ` (2 more replies) 1 sibling, 3 replies; 16+ messages in thread From: CaT @ 2007-04-12 22:52 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, netdev On Mon, Apr 02, 2007 at 12:13:00AM -0700, Andrew Morton wrote: > On Mon, 2 Apr 2007 11:43:19 +1000 CaT <cat@zip.com.au> wrote: > > > I take minute by minute snapshots of network traffic by sampling > > /proc/net/dev and most of the time everything works fine. Occasionally > > though I get petabyte byte traffic and corresponding packet traffic. > > How frequently? > > Are you able to provide some actual numbers (expected and actual values), > so we can look at the bit patterns? I have some now. These are raw lines from /proc/net/dev. In this case it's eth0 at 22:14 that chucked a wee wibbly. Apr 11 22:13:02 ' eth0:17227166357 81379716 0 0 0 0 0 0 33090495625 86656584 0 0 0 0 0 0 ' Apr 11 22:13:02 ' eth1:30708022097 91219466 0 0 0 0 0 0 122989582024 125073786 0 0 0 0 0 0 ' Apr 11 22:14:02 ' eth0:220898233988841368 66750274 0 0 0 0 0 86458738 52386430545 101089219 199313 0 0 0 199313 0 ' Apr 11 22:14:02 ' eth1:30708307787 91220183 0 0 0 0 0 0 122989665004 125074344 0 0 0 0 0 0 ' Apr 11 22:15:02 ' eth0:17227454818 81381144 0 0 0 0 0 0 33091307388 86658381 0 0 0 0 0 0 ' Apr 11 22:15:02 ' eth1:30708569308 91220742 0 0 0 0 0 0 122989732601 125074712 0 0 0 0 0 0 ' On another server (same hardware except for 2ru case, more ram and more hds): Apr 9 06:18:05 ' eth0:1556640056941 3598105481 0 0 0 0 0 0 2281147324747 3318270401 0 0 0 0 0 0 ' Apr 9 06:18:05 ' eth1:912389249044 1190286687 0 0 0 0 0 0 642943095469 991257887 0 0 0 0 0 0 ' Apr 9 06:19:04 ' eth0:14250798570591813804 2284720007938 18638 0 0 18638 0 27375938 1556640980159 3345714490 0 0 0 0 0 0 ' Apr 9 06:19:04 ' eth1:912389281939 1190287072 0 0 0 0 0 0 642943219035 991258183 0 0 0 0 0 0 ' Apr 9 06:20:05 ' eth0:1556643514710 3598121584 0 0 0 0 0 0 2281154391794 3318284878 0 0 0 0 0 0 ' Apr 9 06:20:05 ' eth1:912389305767 1190287354 0 0 0 0 0 0 642943273879 991258351 0 0 0 0 0 0 ' > > This happens on an AMD64, dual core smp box with Broadcom NetXtreme II > > nics. > > What driver drivers that? b44.c? To clarify it's an Intel Dual Core Xeon (I just wound up as thinking of them all as amd64s). Network card driver in use is the one defined by CONFIG_BNX2. Kernel's monolithic. > We do perform racy 64-bit updates of some of the stats counters. But > that'll only affect 32-bit kernels and I'm assuming you're running a 64-bit > kernel on that AMD64 box (are you?) Yes. With 32bit compat for executables built in. -- "To the extent that we overreact, we proffer the terrorists the greatest tribute." - High Court Judge Michael Kirby ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: intermittant petabyte usage reported with broadcom nic 2007-04-12 22:52 ` CaT @ 2007-04-12 23:13 ` Andrew Morton 2007-04-12 23:18 ` Roland Dreier 2007-04-12 23:15 ` Roland Dreier 2007-04-12 23:28 ` Roland Dreier 2 siblings, 1 reply; 16+ messages in thread From: Andrew Morton @ 2007-04-12 23:13 UTC (permalink / raw) To: CaT; +Cc: linux-kernel, netdev, Michael Chan On Fri, 13 Apr 2007 08:52:49 +1000 CaT <cat@zip.com.au> wrote: > On Mon, Apr 02, 2007 at 12:13:00AM -0700, Andrew Morton wrote: > > On Mon, 2 Apr 2007 11:43:19 +1000 CaT <cat@zip.com.au> wrote: > > > > > I take minute by minute snapshots of network traffic by sampling > > > /proc/net/dev and most of the time everything works fine. Occasionally > > > though I get petabyte byte traffic and corresponding packet traffic. > > > > How frequently? > > > > Are you able to provide some actual numbers (expected and actual values), > > so we can look at the bit patterns? > > I have some now. These are raw lines from /proc/net/dev. In this case it's > eth0 at 22:14 that chucked a wee wibbly. > > Apr 11 22:13:02 ' eth0:17227166357 81379716 0 0 0 0 0 0 33090495625 86656584 0 0 0 0 0 0 ' > Apr 11 22:13:02 ' eth1:30708022097 91219466 0 0 0 0 0 0 122989582024 125073786 0 0 0 0 0 0 ' > Apr 11 22:14:02 ' eth0:220898233988841368 66750274 0 0 0 0 0 86458738 52386430545 101089219 199313 0 0 0 199313 0 ' 0x310_c9c6_006a_7f98 Not sure what to make of that. > Apr 11 22:14:02 ' eth1:30708307787 91220183 0 0 0 0 0 0 122989665004 125074344 0 0 0 0 0 0 ' > Apr 11 22:15:02 ' eth0:17227454818 81381144 0 0 0 0 0 0 33091307388 86658381 0 0 0 0 0 0 ' > Apr 11 22:15:02 ' eth1:30708569308 91220742 0 0 0 0 0 0 122989732601 125074712 0 0 0 0 0 0 ' > > On another server (same hardware except for 2ru case, more ram and more hds): > > Apr 9 06:18:05 ' eth0:1556640056941 3598105481 0 0 0 0 0 0 2281147324747 3318270401 0 0 0 0 0 0 ' > Apr 9 06:18:05 ' eth1:912389249044 1190286687 0 0 0 0 0 0 642943095469 991257887 0 0 0 0 0 0 ' > Apr 9 06:19:04 ' eth0:14250798570591813804 2284720007938 18638 0 0 18638 0 27375938 1556640980159 3345714490 0 0 0 0 0 0 ' 0xc5c5_01cb_c5c5_00ac and 0x213_f3ec_ab02 The first one looks like trashed memory: it got overwritten by kernel addresses. Except they're x86-32 kernel addresses, and you're running x86_64 64-bit kernel. hm. I don't see any pattern here. > Apr 9 06:19:04 ' eth1:912389281939 1190287072 0 0 0 0 0 0 642943219035 991258183 0 0 0 0 0 0 ' > Apr 9 06:20:05 ' eth0:1556643514710 3598121584 0 0 0 0 0 0 2281154391794 3318284878 0 0 0 0 0 0 ' > Apr 9 06:20:05 ' eth1:912389305767 1190287354 0 0 0 0 0 0 642943273879 991258351 0 0 0 0 0 0 ' > > > > This happens on an AMD64, dual core smp box with Broadcom NetXtreme II > > > nics. > > > > What driver drivers that? b44.c? > > To clarify it's an Intel Dual Core Xeon (I just wound up as thinking of > them all as amd64s). Network card driver in use is the one defined by > CONFIG_BNX2. Kernel's monolithic. > > > We do perform racy 64-bit updates of some of the stats counters. But > > that'll only affect 32-bit kernels and I'm assuming you're running a 64-bit > > kernel on that AMD64 box (are you?) > > Yes. With 32bit compat for executables built in. OK. I was earlier assuming that you were seeing transient funny numbers. But in fact I think you're saying that the numbers go bad, and then stay bad. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: intermittant petabyte usage reported with broadcom nic 2007-04-12 23:13 ` Andrew Morton @ 2007-04-12 23:18 ` Roland Dreier 2007-04-12 23:25 ` CaT 0 siblings, 1 reply; 16+ messages in thread From: Roland Dreier @ 2007-04-12 23:18 UTC (permalink / raw) To: Andrew Morton; +Cc: CaT, linux-kernel, netdev, Michael Chan > > Apr 11 22:14:02 ' eth0:220898233988841368 66750274 0 0 0 0 0 86458738 52386430545 101089219 199313 0 0 0 199313 0 ' > > Apr 11 22:15:02 ' eth0:17227454818 81381144 0 0 0 0 0 0 33091307388 86658381 0 0 0 0 0 0 ' > But in fact I think you're saying that the numbers go bad, and then stay bad. Doesn't look like it -- one minute after the first hiccup the eth0 #s look reasonable again. - R. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: intermittant petabyte usage reported with broadcom nic 2007-04-12 23:18 ` Roland Dreier @ 2007-04-12 23:25 ` CaT 0 siblings, 0 replies; 16+ messages in thread From: CaT @ 2007-04-12 23:25 UTC (permalink / raw) To: Roland Dreier; +Cc: Andrew Morton, linux-kernel, netdev, Michael Chan On Thu, Apr 12, 2007 at 04:18:24PM -0700, Roland Dreier wrote: > > > Apr 11 22:14:02 ' eth0:220898233988841368 66750274 0 0 0 0 0 86458738 52386430545 101089219 199313 0 0 0 199313 0 ' > > > > Apr 11 22:15:02 ' eth0:17227454818 81381144 0 0 0 0 0 0 33091307388 86658381 0 0 0 0 0 0 ' > > > But in fact I think you're saying that the numbers go bad, and then stay bad. > > Doesn't look like it -- one minute after the first hiccup the eth0 #s > look reasonable again. Yeah. Sorry for not making it clear. I included good values on either side of the bad one. -- "To the extent that we overreact, we proffer the terrorists the greatest tribute." - High Court Judge Michael Kirby ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: intermittant petabyte usage reported with broadcom nic 2007-04-12 22:52 ` CaT 2007-04-12 23:13 ` Andrew Morton @ 2007-04-12 23:15 ` Roland Dreier 2007-04-12 23:28 ` Roland Dreier 2 siblings, 0 replies; 16+ messages in thread From: Roland Dreier @ 2007-04-12 23:15 UTC (permalink / raw) To: CaT; +Cc: Andrew Morton, linux-kernel, netdev > Apr 9 06:19:04 ' eth0:14250798570591813804 2284720007938 18638 0 0 18638 0 27375938 1556640980159 3345714490 0 0 0 0 0 0 ' One odd thing is that crazy number 14250798570591813804 is c5c501cbc5c500ac in hex. I dunno what the significant of the 0xc5 bit pattern is though... The other line has 220898233988841368, which is 0x310c9c6006a7f98, not nearly so regular a patter. I don't think I'm helping much... ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: intermittant petabyte usage reported with broadcom nic 2007-04-12 22:52 ` CaT 2007-04-12 23:13 ` Andrew Morton 2007-04-12 23:15 ` Roland Dreier @ 2007-04-12 23:28 ` Roland Dreier 2007-04-13 1:15 ` Andi Kleen 2 siblings, 1 reply; 16+ messages in thread From: Roland Dreier @ 2007-04-12 23:28 UTC (permalink / raw) To: CaT; +Cc: Michael Chan, Andrew Morton, linux-kernel, netdev [Adding Michael Chan, who seems to look after bnx2, to the cc list] > To clarify it's an Intel Dual Core Xeon (I just wound up as thinking of > them all as amd64s). Network card driver in use is the one defined by > CONFIG_BNX2. Kernel's monolithic. >From a quick look at bnx2.c, it seems that the driver gives the NIC (firmware?) a block of memory to DMA stats into, and just reads from that memory in its get_stats method. So if you're seeing wonky stats from the NIC intermittently, my best guess would be that firmware is occasionally writing junk into the stats block. - R. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: intermittant petabyte usage reported with broadcom nic 2007-04-12 23:28 ` Roland Dreier @ 2007-04-13 1:15 ` Andi Kleen 0 siblings, 0 replies; 16+ messages in thread From: Andi Kleen @ 2007-04-13 1:15 UTC (permalink / raw) To: Roland Dreier; +Cc: CaT, Michael Chan, Andrew Morton, linux-kernel, netdev Roland Dreier <rdreier@cisco.com> writes: > [Adding Michael Chan, who seems to look after bnx2, to the cc list] > > > To clarify it's an Intel Dual Core Xeon (I just wound up as thinking of > > them all as amd64s). Network card driver in use is the one defined by > > CONFIG_BNX2. Kernel's monolithic. > > From a quick look at bnx2.c, it seems that the driver gives the NIC > (firmware?) a block of memory to DMA stats into, and just reads from > that memory in its get_stats method. So if you're seeing wonky stats > from the NIC intermittently, my best guess would be that firmware is > occasionally writing junk into the stats block. When only the firmware is writing to that area it could be put into an own page and then write protected with change_page_attr() That would catch any corruption coming from the rest of the kernel. -Andi ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2007-05-22 0:26 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20070402014319.GA8345@zip.com.au>
2007-04-02 7:13 ` intermittant petabyte usage reported with broadcom nic Andrew Morton
2007-04-02 7:41 ` CaT
2007-04-02 10:31 ` Jean-Daniel Pauget
2007-04-15 0:20 ` Michael Chan
2007-04-16 19:10 ` Michael Chan
2007-04-16 23:43 ` CaT
2007-04-17 12:01 ` Jean-Daniel Pauget
2007-04-17 15:58 ` Roland Dreier
2007-05-22 1:15 ` Michael Chan
2007-04-12 22:52 ` CaT
2007-04-12 23:13 ` Andrew Morton
2007-04-12 23:18 ` Roland Dreier
2007-04-12 23:25 ` CaT
2007-04-12 23:15 ` Roland Dreier
2007-04-12 23:28 ` Roland Dreier
2007-04-13 1:15 ` Andi Kleen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).