From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753886AbcDSKoN (ORCPT ); Tue, 19 Apr 2016 06:44:13 -0400 Received: from mx1.redhat.com ([209.132.183.28]:49500 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752381AbcDSKoL (ORCPT ); Tue, 19 Apr 2016 06:44:11 -0400 Message-ID: <57160BF8.6020001@redhat.com> Date: Tue, 19 Apr 2016 12:44:08 +0200 From: Denys Vlasenko User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: Jeff Kirsher , Jesse Brandeburg , Shannon Nelson , Carolyn Wyborny , Don Skidmore , Bruce Allan , John Ronciak , Mitch Williams , LKML Subject: e1000e: can TIMINCA register be zero? Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, I have a user report of division by zero in e1000e_cyclecounter_read+0xd9/0x100 at modprobe: [] timecounter_init+0x24/0x40 [] e1000e_config_hwtstamp+0x1c4/0x2e0 [e1000e] [] e1000e_reset+0x1c5/0x7a0 [e1000e] [] e1000_probe+0xa2f/0xc7e [e1000e] [] local_pci_probe+0x17/0x20 [] pci_device_probe+0x101/0x120 [] ? driver_sysfs_add+0x62/0x90 [] driver_probe_device+0xaa/0x3a0 [] __driver_attach+0xab/0xb0 [] ? __driver_attach+0x0/0xb0 [] bus_for_each_dev+0x64/0x90 [] driver_attach+0x1e/0x20 [] bus_add_driver+0x1e8/0x2b0 [] driver_register+0x5f/0xe0 [] __pci_register_driver+0x56/0xd0 [] ? e1000_init_module+0x0/0x43 [e1000e] [] e1000_init_module+0x41/0x43 [e1000e] [] do_one_initcall+0xc0/0x280 [] sys_init_module+0xe1/0x250 [] system_call_fastpath+0x16/0x1b User says it happens on hotplug. On code inspection, this is clearly a case of er32(TIMINCA) & E1000_TIMINCA_INCVALUE_MASK == 0: /* errata for 82574/82583 possible bad bits read from SYSTIMH/L * check to see that the time is incrementing at a reasonable * rate and is a multiple of incvalue */ ==> incvalue = er32(TIMINCA) & E1000_TIMINCA_INCVALUE_MASK; for (i = 0; i < E1000_MAX_82574_SYSTIM_REREADS; i++) { /* latch SYSTIMH on read of SYSTIML */ systim_next = (cycle_t)er32(SYSTIML); systim_next |= (cycle_t)er32(SYSTIMH) << 32; time_delta = systim_next - systim; temp = time_delta; ====> rem = do_div(temp, incvalue); systim = systim_next; if ((time_delta < E1000_82574_SYSTIM_EPSILON) && (rem == 0)) break; } Knowing nothing about e1000e, I can easily slap on a quick fix here: rem = incvalue ? do_div(temp, incvalue) : (time_delta != 0); However, I would like to alert you guys that this was seen. Would zero counter increment in er32(TIMINCA) cause problems elsewhere? In 1000e_config_hwtstamp(), it is initialized before timecounter_init(): /* Get and set the System Time Register SYSTIM base frequency */ ret_val = e1000e_get_base_timinca(adapter, ®val); if (ret_val) return ret_val; ==> ew32(TIMINCA, regval); /* reset the ns time counter */ ==> timecounter_init(&adapter->tc, &adapter->cc, ktime_to_ns(ktime_get_real())); By code inspection, e1000e_get_base_timinca() either returns -EINVAL and we don't do timecounter_init() and the division/0 location is not reached, or e1000e_get_base_timinca(®val) sets nonzero regval. Then we set TIMINCA to this nonzero value. Isn't it fishy that then timecounter_init() -> e1000e_cyclecounter_read() -> er32(TIMINCA) sees zero there?