* e1000e er32(TIMINCA) value returned 0 Virtual Machiens
@ 2016-02-07 15:28 Thomas Elliott
2016-02-07 22:04 ` Richard Cochran
0 siblings, 1 reply; 3+ messages in thread
From: Thomas Elliott @ 2016-02-07 15:28 UTC (permalink / raw)
To: netdev
Bug potentially specific to VM's, particularly in this case VMWare 6.0.
Issue was found to occur when a VMWare virtual machine was setup to
operate as OS type Windows 8 or Windows 10.
The issue is the NIC defaults, in this setup, to e1000e Driver.
>From a Kernel 4.4.0 build, but this happens from mainline all the way
back as far as Kernel 3.19 (possibly further).
The specifics of the problem from a serial output are:
Linux version 4.4.0 (root@debian64) (gcc version 4.9.2 (Debian
4.9.2-10) ) #2 SMP Mon Jan 25 12:44:48 EST 2016
ACPI: RSDP 0x00000000000F6AC0 000024 (v02 PTLTD )
ACPI: XSDT 0x000000003FEF114C 00005C (v01 INTEL 440BX 06040000 VMW
01324272)
ACPI: FACP 0x000000003FEFEE73 0000F4 (v04 INTEL 440BX 06040000 PTL
000F4240)
ACPI: DSDT 0x000000003FEF13B4 00DABF (v01 PTLTD Custom 06040000
MSFT 03000001)
ACPI: FACS 0x000000003FEFFFC0 000040
ACPI: FACS 0x000000003FEFFFC0 000040
ACPI: BOOT 0x000000003FEF138C 000028 (v01 PTLTD $SBFTBL$ 06040000
LTP 00000001)
ACPI: APIC 0x000000003FEF133C 000050 (v01 PTLTD ? APIC 06040000
LTP 00000000)
ACPI: MCFG 0x000000003FEF1300 00003C (v01 PTLTD $PCITBL$ 06040000
LTP 00000001)
ACPI: SRAT 0x000000003FEF1248 0000B8 (v02 VMWARE MEMPLUG 06040000 VMW
00000001)
ACPI: HPET 0x000000003FEF1210 000038 (v01 VMWARE VMW HPET 06040000 VMW
00000001)
ACPI: WAET 0x000000003FEF11E8 000028 (v01 VMWARE VMW WAET 06040000 VMW
00000001)
Kernel command line: loglevel=6 init=/sbin/init initrd=init.xz
root=/dev/ram0 rw ramdisk_size=127000 keymap= web=10.0.7.1/fog/
consoleblank=0 mac=00:0c:29:38:ec:42 ftp=10.2.1.5
storage=10.2.1.5:/images/ storageip=10.2.1.5 web=10.0.7.1/fog/ osid=50
consoleblank=0 irqpoll console=ttyS0,115200 console=tty0
hostname=ARCHTEST chkdsk=0 img=arch64 imgType=n imgPartitionType=all
imgid=5 imgFormat= PIGZ_COMP=-6 hostearly=1 mining=1 miningcores=1
miningpath=http://fogproject.org/fogpackage.zip type=down
Misrouted IRQ fixup and polling support enabled
This may significantly impact system performance
ACPI: 1 ACPI AML tables successfully acquired and loaded
perf_event_intel: CPUID marked event: 'cpu cycles' unavailable
perf_event_intel: CPUID marked event: 'instructions' unavailable
perf_event_intel: CPUID marked event: 'bus cycles' unavailable
perf_event_intel: CPUID marked event: 'cache references' unavailable
perf_event_intel: CPUID marked event: 'cache misses' unavailable
perf_event_intel: CPUID marked event: 'branch instructions' unavailable
perf_event_intel: CPUID marked event: 'branch misses' unavailable
[Firmware Bug]: ACPI: BIOS _OSI(Linux) query ignored
ACPI: Enabled 2 GPEs in block 00 to 0F
SCSI subsystem initialized
FS-Cache: Loaded
FS-Cache: Netfs 'nfs' registered for caching
NFS: Registering the id_resolver key type
Key type id_resolver registered
Key type id_legacy registered
FS-Cache: Netfs 'cifs' registered for caching
Key type cifs.idmap registered
Warning: Processor Platform Limit event detected, but not handled.
Consider compiling CPUfreq support into your kernel.
Error creating debugfs parent
Loading Adaptec I2O RAID: Version 2.4 Build 5go
aic94xx: Adaptec aic94xx SAS/SATA driver version 1.0.3 loaded
scsi: <fdomain> Detection failed (no card)
iscsi: registered transport (qla4xxx)
GDT-HA: Storage RAID Controller Driver. Version: 3.05
3ware Storage Controller device driver for Linux v1.26.02.003.
3ware 9000 Storage Controller device driver for Linux v2.26.02.014.
scsi 0:0:0:0: Direct-Access VMware Virtual disk 1.0 PQ: 0 ANSI: 2
sd 0:0:0:0: [sda] 41943040 512-byte logical blocks: (21.4 GB/20.0 GiB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Cache data unavailable
sd 0:0:0:0: [sda] Assuming drive cache: write through
sd 0:0:0:0: [sda] Attached SCSI disk
scsi 2:0:0:0: CD-ROM NECVMWar VMware IDE CDR10 1.00 PQ: 0 ANSI: 5
cxgb4vf: could not create debugfs entry, continuing
v1.01-e (2.4 port) Sep-11-2006 Donald Becker <becker@scyld.com>
http://www.scyld.com/network/drivers.html
divide error: 0000 [#1] SMP
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.4.0 #2
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop
Reference Platform, BIOS 6.00 09/30/2014
task: ffff88003e4b8000 ti: ffff88003e4c0000 task.ti: ffff88003e4c0000
RIP: 0010:[<ffffffff8172817a>] [<ffffffff8172817a>] 0xffffffff8172817a
RSP: 0000:ffff88003e4c3cf0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff880038cdf640 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880038cdf628
RBP: ffff880038cdf628 R08: 0000000000000032 R09: 0000000000000000
R10: 00000007ffffffff R11: 00000000070f8406 R12: 142fe5b9982e5912
R13: ffff880038cdcc38 R14: 0000000000000000 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff88003ea00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000001f74000 CR4: 00000000000006b0
Stack:
ffffffff81071eca ffff880038cdc780 0000000000000000 0000000000000000
ffffffff8172ec3c 01a000002252a32c ffff880038cdc780 ffff880038cdcc38
0000000000008000 ffff880038cdcc38 0000000000000003 ffff880038cdc000
Call Trace:
[<ffffffff81071eca>] ? 0xffffffff81071eca
[<ffffffff8172ec3c>] ? 0xffffffff8172ec3c
[<ffffffff8172f5ec>] ? 0xffffffff8172f5ec
[<ffffffff817301d9>] ? 0xffffffff817301d9
[<ffffffff8133b2cd>] ? 0xffffffff8133b2cd
[<ffffffff813b272c>] ? 0xffffffff813b272c
[<ffffffff813b28eb>] ? 0xffffffff813b28eb
[<ffffffff813b2898>] ? 0xffffffff813b2898
[<ffffffff813b0fa9>] ? 0xffffffff813b0fa9
[<ffffffff813b1f99>] ? 0xffffffff813b1f99
[<ffffffff813b2e12>] ? 0xffffffff813b2e12
[<ffffffff820854ff>] ? 0xffffffff820854ff
[<ffffffff8100037d>] ? 0xffffffff8100037d
[<ffffffff82055e52>] ? 0xffffffff82055e52
[<ffffffff81aac107>] ? 0xffffffff81aac107
[<ffffffff81aac10c>] ? 0xffffffff81aac10c
[<ffffffff81ab07cf>] ? 0xffffffff81ab07cf
[<ffffffff81aac107>] ? 0xffffffff81aac107
Code: 18 d6 ff ff 8b 80 00 b6 00 00 48 8b 8f 18 d6 ff ff 8b 89 04 b6
00 00 48 c1 e1 20 89 c0 48 09 c1 49 89 c9 49 29 d1 31 d2 4c 89 c8 <48>
f7 f6 48 85 d2 75 05 4d 39 d1 76 08 41 ff c8 48 89 ca 75 bd
RIP [<ffffffff8172817a>] 0xffffffff8172817a
RSP <ffff88003e4c3cf0>
---[ end trace 5900358cb1efc29f ]---
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
Kernel Offset: disabled
---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
While I do understand that this is a problem at the VM software level,
it seems to appear in more than just VMWare. We've seen similar
issues reported from Proxmox, VirtualBox, and even OpenVM.
A proposed fix is to check if TIMINCA is returned with 0, as division
by 0 seems to be the reasoning for the panic altogether.
As I understand this isn't a "normal" situation for physical boards,
it still seems a bit rought to always expect physical boards will
NEVER return 0 for this situation.
A potential patch to fix this can be done with a single line of code.
All this does is check if the value of incvalue is 0 and return systim
if it is. This means you're not going to run into a situation and is
just plain, in my opinion, better error checking. A single line of
code that allows VMs, and possibly future hardware that might present
this issue, from panicking over something that is so simple a check.
Patch from 4.4.1 kernel follows:
--- a/drivers/net/ethernet/intel/e1000e/netdev.c 2016-02-07
09:42:33.493965436 -0500
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c 2016-02-07
09:43:16.853965023 -0500
@@ -4313,6 +4313,7 @@ static cycle_t e1000e_cyclecounter_read(
* rate and is a multiple of incvalue
*/
incvalue = er32(TIMINCA) & E1000_TIMINCA_INCVALUE_MASK;
+ if (incvalue == 0) return systim;
for (i = 0; i < E1000_MAX_82574_SYSTIM_REREADS; i++) {
/* latch SYSTIMH on read of SYSTIML */
systim_next = (cycle_t)er32(SYSTIML);
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: e1000e er32(TIMINCA) value returned 0 Virtual Machiens
2016-02-07 15:28 e1000e er32(TIMINCA) value returned 0 Virtual Machiens Thomas Elliott
@ 2016-02-07 22:04 ` Richard Cochran
2016-02-07 22:41 ` Thomas Elliott
0 siblings, 1 reply; 3+ messages in thread
From: Richard Cochran @ 2016-02-07 22:04 UTC (permalink / raw)
To: Thomas Elliott; +Cc: netdev
On Sun, Feb 07, 2016 at 10:28:48AM -0500, Thomas Elliott wrote:
> task: ffff88003e4b8000 ti: ffff88003e4c0000 task.ti: ffff88003e4c0000
> RIP: 0010:[<ffffffff8172817a>] [<ffffffff8172817a>] 0xffffffff8172817a
> RSP: 0000:ffff88003e4c3cf0 EFLAGS: 00010246
> RAX: 0000000000000000 RBX: ffff880038cdf640 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880038cdf628
> RBP: ffff880038cdf628 R08: 0000000000000032 R09: 0000000000000000
> R10: 00000007ffffffff R11: 00000000070f8406 R12: 142fe5b9982e5912
> R13: ffff880038cdcc38 R14: 0000000000000000 R15: 0000000000000000
> FS: 0000000000000000(0000) GS:ffff88003ea00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 0000000000000000 CR3: 0000000001f74000 CR4: 00000000000006b0
> Stack:
> ffffffff81071eca ffff880038cdc780 0000000000000000 0000000000000000
> ffffffff8172ec3c 01a000002252a32c ffff880038cdc780 ffff880038cdcc38
> 0000000000008000 ffff880038cdcc38 0000000000000003 ffff880038cdc000
> Call Trace:
> [<ffffffff81071eca>] ? 0xffffffff81071eca
Are you sure about the which funtion throws this bug? KALLSYMS?
> A proposed fix is to check if TIMINCA is returned with 0, as division
> by 0 seems to be the reasoning for the panic altogether.
Divide by zero is indeed a bug, but the question is, why does this
happen?
> As I understand this isn't a "normal" situation for physical boards,
> it still seems a bit rought to always expect physical boards will
> NEVER return 0 for this situation.
That register is set to a non-zero value in e1000e_config_hwtstamp,
which is called from e1000_probe via e1000e_reset. So it appears to
be initialized.
> All this does is check if the value of incvalue is 0 and return systim
> if it is. This means you're not going to run into a situation and is
> just plain, in my opinion, better error checking. A single line of
> code that allows VMs, and possibly future hardware that might present
> this issue, from panicking over something that is so simple a check.
This is only papering only the problem. We need to know how TIMINCA
is getting cleared to zero.
Thanks,
Richard
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: e1000e er32(TIMINCA) value returned 0 Virtual Machiens
2016-02-07 22:04 ` Richard Cochran
@ 2016-02-07 22:41 ` Thomas Elliott
0 siblings, 0 replies; 3+ messages in thread
From: Thomas Elliott @ 2016-02-07 22:41 UTC (permalink / raw)
To: Richard Cochran; +Cc: netdev
I don't know how TIMINCA is getting cleared to zero. All I know is
this situation apparently occurs with VM's. And while I do agree a
fix from the VM guys would be necessary, this simple one liner fixes
it, because SOMETHING is causing it to get set to/as/reset to zero.
If there's nothing to be done fine, I just thought a simple one line
that checks the incvalue BEFORE it panics a kernel would be useful.
I provided what I've been able to find and a potential solution to
what ailed some of what I've experienced. I don't understand why it's
happening, I just know that it IS happening.
On Sun, Feb 7, 2016 at 5:04 PM, Richard Cochran
<richardcochran@gmail.com> wrote:
> On Sun, Feb 07, 2016 at 10:28:48AM -0500, Thomas Elliott wrote:
>> task: ffff88003e4b8000 ti: ffff88003e4c0000 task.ti: ffff88003e4c0000
>> RIP: 0010:[<ffffffff8172817a>] [<ffffffff8172817a>] 0xffffffff8172817a
>> RSP: 0000:ffff88003e4c3cf0 EFLAGS: 00010246
>> RAX: 0000000000000000 RBX: ffff880038cdf640 RCX: 0000000000000000
>> RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880038cdf628
>> RBP: ffff880038cdf628 R08: 0000000000000032 R09: 0000000000000000
>> R10: 00000007ffffffff R11: 00000000070f8406 R12: 142fe5b9982e5912
>> R13: ffff880038cdcc38 R14: 0000000000000000 R15: 0000000000000000
>> FS: 0000000000000000(0000) GS:ffff88003ea00000(0000) knlGS:0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> CR2: 0000000000000000 CR3: 0000000001f74000 CR4: 00000000000006b0
>> Stack:
>> ffffffff81071eca ffff880038cdc780 0000000000000000 0000000000000000
>> ffffffff8172ec3c 01a000002252a32c ffff880038cdc780 ffff880038cdcc38
>> 0000000000008000 ffff880038cdcc38 0000000000000003 ffff880038cdc000
>> Call Trace:
>> [<ffffffff81071eca>] ? 0xffffffff81071eca
>
> Are you sure about the which funtion throws this bug? KALLSYMS?
>
>> A proposed fix is to check if TIMINCA is returned with 0, as division
>> by 0 seems to be the reasoning for the panic altogether.
>
> Divide by zero is indeed a bug, but the question is, why does this
> happen?
>
>> As I understand this isn't a "normal" situation for physical boards,
>> it still seems a bit rought to always expect physical boards will
>> NEVER return 0 for this situation.
>
> That register is set to a non-zero value in e1000e_config_hwtstamp,
> which is called from e1000_probe via e1000e_reset. So it appears to
> be initialized.
>
>> All this does is check if the value of incvalue is 0 and return systim
>> if it is. This means you're not going to run into a situation and is
>> just plain, in my opinion, better error checking. A single line of
>> code that allows VMs, and possibly future hardware that might present
>> this issue, from panicking over something that is so simple a check.
>
> This is only papering only the problem. We need to know how TIMINCA
> is getting cleared to zero.
>
> Thanks,
> Richard
--
V/R
Thomas G. Elliott
247 Sugar Hill Road
Crown Point, NY 12928
Home: 518-907-4327 (Preferred)
Cell: 518-335-8682
E-mail: tommygunsster@gmail.com (Preferred)
Alt: thomas@mastacontrola.com
Alt2: thomas.elliott@mastacontrola.com
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2016-02-07 22:41 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-02-07 15:28 e1000e er32(TIMINCA) value returned 0 Virtual Machiens Thomas Elliott
2016-02-07 22:04 ` Richard Cochran
2016-02-07 22:41 ` Thomas Elliott
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).