* e1000e I219 timestamping oops related to TSYNCRXCTL read
@ 2018-04-25 6:52 Benjamin Poirier
2018-04-25 19:59 ` Keller, Jacob E
0 siblings, 1 reply; 6+ messages in thread
From: Benjamin Poirier @ 2018-04-25 6:52 UTC (permalink / raw)
To: Bruce Allan, Yanir Lubetkin, Jacob Keller, Neftin, Sasha
Cc: Alexander Duyck, Jeff Kirsher, Achim Mildenberger, olouvignes,
jayanth, ehabkost, postmodern.mod3, Bart.VanAssche,
intel-wired-lan, netdev
In the following openSUSE bug report
https://bugzilla.suse.com/show_bug.cgi?id=1075876
Achim reported an oops related to e1000e timestamping:
kernel: RIP: 0010:[<ffffffff8110303f>] timecounter_read+0xf/0x50
[...]
kernel: Call Trace:
kernel: [<ffffffffa0806b0f>] e1000e_phc_gettime+0x2f/0x60 [e1000e]
kernel: [<ffffffffa0806c5d>] e1000e_systim_overflow_work+0x1d/0x80 [e1000e]
kernel: [<ffffffff810992c5>] process_one_work+0x155/0x440
kernel: [<ffffffff81099e16>] worker_thread+0x116/0x4b0
kernel: [<ffffffff8109f422>] kthread+0xd2/0xf0
kernel: [<ffffffff8163184f>] ret_from_fork+0x3f/0x70
It always occurs 4 hours after boot but not on every boot. It is most
likely the same problem reported here:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668356
http://lkml.iu.edu/hypermail/linux/kernel/1506.2/index.html#02530
https://bugzilla.redhat.com/show_bug.cgi?id=1463882
https://bugzilla.redhat.com/show_bug.cgi?id=1431863
This occurs with MAC: 12, e1000_pch_spt/I219. The reporter has
reproduced it on a v4.16 derivative.
We've traced it to the fact that e1000e_systim_reset() skips the
timecounter_init() call if e1000e_get_base_timinca() returns -EINVAL,
which leads to a null deref in timecounter_read() (see comment 8 of the
suse bugzilla for more details.)
In commit 83129b37ef35 ("e1000e: fix systim issues", v4.2-rc1) Yanir
reworked e1000e_get_base_timinca() in such a way that it can return
-EINVAL for e1000_pch_spt if the SYSCFI bit is not set in TSYNCRXCTL.
This is also the commit that was identified by bisection in the second
link above (lkml).
What we've observed (in comment 14) is that TSYNCRXCTL reads sometimes
don't have the SYSCFI bit set. Retrying the read shortly after finds the
bit to be set. This was observed at boot (probe) but also link up and
link down.
I have a few questions:
What's the purpose of the SYSCFI bit in TSYNCRXCTL ("Reserved" in the
datasheet)?
Why does it look like subsequent reads of TSYNCRXCTL sometimes have the
SYSCFI bit set/not set on I219?
Is it right to check the SYSCFI bit in e1000e_get_base_timinca() for
_spt and return -EINVAL if it's not set? Could we just remove that
check?
The patch in comment 13 of the suse bugzilla works around the problem by
retrying TSYNCRXCTL reads, maybe we could instead remove that read
altogether or move the timecounter_init() call to at least avoid the
oops. The best approach to take seems to depend on the behavior expected
of TSYNCRXCTL reads on I219 so I'm hoping that you could provide more
info on that.
Thanks,
-Benjamin
^ permalink raw reply [flat|nested] 6+ messages in thread* RE: e1000e I219 timestamping oops related to TSYNCRXCTL read 2018-04-25 6:52 e1000e I219 timestamping oops related to TSYNCRXCTL read Benjamin Poirier @ 2018-04-25 19:59 ` Keller, Jacob E 2018-05-10 7:28 ` [PATCH] e1000e: Ignore TSYNCRXCTL when getting I219 clock attributes Benjamin Poirier 0 siblings, 1 reply; 6+ messages in thread From: Keller, Jacob E @ 2018-04-25 19:59 UTC (permalink / raw) To: Benjamin Poirier, Allan, Bruce W, Yanir Lubetkin, Neftin, Sasha Cc: Alexander Duyck, Kirsher, Jeffrey T, Achim Mildenberger, olouvignes@gmail.com, jayanth@goubiq.com, ehabkost@redhat.com, postmodern.mod3@gmail.com, Bart.VanAssche@wdc.com, intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org Hi Benjamin, > -----Original Message----- > From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] > On Behalf Of Benjamin Poirier > Sent: Tuesday, April 24, 2018 11:53 PM > To: Allan, Bruce W <bruce.w.allan@intel.com>; Yanir Lubetkin > <yanirx.lubetkin@intel.com>; Keller, Jacob E <jacob.e.keller@intel.com>; Neftin, > Sasha <sasha.neftin@intel.com> > Cc: Alexander Duyck <alexander.duyck@gmail.com>; Kirsher, Jeffrey T > <jeffrey.t.kirsher@intel.com>; Achim Mildenberger <admin@fph.physik.uni- > karlsruhe.de>; olouvignes@gmail.com; jayanth@goubiq.com; > ehabkost@redhat.com; postmodern.mod3@gmail.com; > Bart.VanAssche@wdc.com; intel-wired-lan@lists.osuosl.org; > netdev@vger.kernel.org > Subject: e1000e I219 timestamping oops related to TSYNCRXCTL read > > In the following openSUSE bug report > https://bugzilla.suse.com/show_bug.cgi?id=1075876 > Achim reported an oops related to e1000e timestamping: > kernel: RIP: 0010:[<ffffffff8110303f>] timecounter_read+0xf/0x50 > [...] > kernel: Call Trace: > kernel: [<ffffffffa0806b0f>] e1000e_phc_gettime+0x2f/0x60 [e1000e] > kernel: [<ffffffffa0806c5d>] e1000e_systim_overflow_work+0x1d/0x80 [e1000e] > kernel: [<ffffffff810992c5>] process_one_work+0x155/0x440 > kernel: [<ffffffff81099e16>] worker_thread+0x116/0x4b0 > kernel: [<ffffffff8109f422>] kthread+0xd2/0xf0 > kernel: [<ffffffff8163184f>] ret_from_fork+0x3f/0x70 > > It always occurs 4 hours after boot but not on every boot. It is most > likely the same problem reported here: > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668356 > http://lkml.iu.edu/hypermail/linux/kernel/1506.2/index.html#02530 > https://bugzilla.redhat.com/show_bug.cgi?id=1463882 > https://bugzilla.redhat.com/show_bug.cgi?id=1431863 > It probably occurs due to the systim overflow check, yes. > This occurs with MAC: 12, e1000_pch_spt/I219. The reporter has > reproduced it on a v4.16 derivative. > > We've traced it to the fact that e1000e_systim_reset() skips the > timecounter_init() call if e1000e_get_base_timinca() returns -EINVAL, > which leads to a null deref in timecounter_read() (see comment 8 of the > suse bugzilla for more details.) > > In commit 83129b37ef35 ("e1000e: fix systim issues", v4.2-rc1) Yanir > reworked e1000e_get_base_timinca() in such a way that it can return > -EINVAL for e1000_pch_spt if the SYSCFI bit is not set in TSYNCRXCTL. > This is also the commit that was identified by bisection in the second > link above (lkml). > > What we've observed (in comment 14) is that TSYNCRXCTL reads sometimes > don't have the SYSCFI bit set. Retrying the read shortly after finds the > bit to be set. This was observed at boot (probe) but also link up and > link down. > I don't know offhand what the SYSCFI bit is for yet still digging into it. > I have a few questions: > > What's the purpose of the SYSCFI bit in TSYNCRXCTL ("Reserved" in the > datasheet)? > > Why does it look like subsequent reads of TSYNCRXCTL sometimes have the > SYSCFI bit set/not set on I219? > > Is it right to check the SYSCFI bit in e1000e_get_base_timinca() for > _spt and return -EINVAL if it's not set? Could we just remove that > check? > I think the right approach might be proper cleanup when we fail to reset. I think the problem is that when e1000e_systim_reset is called and fails, we don't properly cleanup the work items. I think we need to actually stop and kill the work task so that it won't run. > The patch in comment 13 of the suse bugzilla works around the problem by > retrying TSYNCRXCTL reads, maybe we could instead remove that read > altogether or move the timecounter_init() call to at least avoid the > oops. The best approach to take seems to depend on the behavior expected > of TSYNCRXCTL reads on I219 so I'm hoping that you could provide more > info on that. > Yea, we need to do something here, I'm still investigating why we need the SYSCFI check, but at a minimum, we should disable the overflow check task if we fail here, I think. It looks like the SYSCFI is the System Clock Frequency Indicator bit, and it should be used to tell which of two clock frequencies to choose. I do not understand why that would be changing at different reads. We do need to make sure the clock is already enabled, but we do that prior to the switch case... Something is really weird here... Thanks, Jake > Thanks, > -Benjamin ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH] e1000e: Ignore TSYNCRXCTL when getting I219 clock attributes 2018-04-25 19:59 ` Keller, Jacob E @ 2018-05-10 7:28 ` Benjamin Poirier 2018-05-10 18:42 ` Keller, Jacob E 2018-05-23 0:44 ` Brown, Aaron F 0 siblings, 2 replies; 6+ messages in thread From: Benjamin Poirier @ 2018-05-10 7:28 UTC (permalink / raw) To: Jeff Kirsher Cc: Keller, Jacob E, Achim Mildenberger, olouvignes, jayanth, ehabkost, postmodern.mod3, Bart.VanAssche, intel-wired-lan, netdev, linux-kernel There have been multiple reports of crashes that look like kernel: RIP: 0010:[<ffffffff8110303f>] timecounter_read+0xf/0x50 [...] kernel: Call Trace: kernel: [<ffffffffa0806b0f>] e1000e_phc_gettime+0x2f/0x60 [e1000e] kernel: [<ffffffffa0806c5d>] e1000e_systim_overflow_work+0x1d/0x80 [e1000e] kernel: [<ffffffff810992c5>] process_one_work+0x155/0x440 kernel: [<ffffffff81099e16>] worker_thread+0x116/0x4b0 kernel: [<ffffffff8109f422>] kthread+0xd2/0xf0 kernel: [<ffffffff8163184f>] ret_from_fork+0x3f/0x70 These can be traced back to the fact that e1000e_systim_reset() skips the timecounter_init() call if e1000e_get_base_timinca() returns -EINVAL, which leads to a null deref in timecounter_read(). Commit 83129b37ef35 ("e1000e: fix systim issues", v4.2-rc1) reworked e1000e_get_base_timinca() in such a way that it can return -EINVAL for e1000_pch_spt if the SYSCFI bit is not set in TSYNCRXCTL. Some experimentation has shown that on I219 (e1000_pch_spt, "MAC: 12") adapters, the E1000_TSYNCRXCTL_SYSCFI flag is unstable; TSYNCRXCTL reads sometimes don't have the SYSCFI bit set. Retrying the read shortly after finds the bit to be set. This was observed at boot (probe) but also link up and link down. Moreover, the phc (PTP Hardware Clock) seems to operate normally even after reads where SYSCFI=0. Therefore, remove this register read and unconditionally set the clock parameters. Reported-by: Achim Mildenberger <admin@fph.physik.uni-karlsruhe.de> Message-Id: <20180425065243.g5mqewg5irkwgwgv@f2> Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1075876 Fixes: 83129b37ef35 ("e1000e: fix systim issues") Signed-off-by: Benjamin Poirier <bpoirier@suse.com> --- drivers/net/ethernet/intel/e1000e/netdev.c | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c index ec4a9759a6f2..3afb1f3b6f91 100644 --- a/drivers/net/ethernet/intel/e1000e/netdev.c +++ b/drivers/net/ethernet/intel/e1000e/netdev.c @@ -3546,15 +3546,12 @@ s32 e1000e_get_base_timinca(struct e1000_adapter *adapter, u32 *timinca) } break; case e1000_pch_spt: - if (er32(TSYNCRXCTL) & E1000_TSYNCRXCTL_SYSCFI) { - /* Stable 24MHz frequency */ - incperiod = INCPERIOD_24MHZ; - incvalue = INCVALUE_24MHZ; - shift = INCVALUE_SHIFT_24MHZ; - adapter->cc.shift = shift; - break; - } - return -EINVAL; + /* Stable 24MHz frequency */ + incperiod = INCPERIOD_24MHZ; + incvalue = INCVALUE_24MHZ; + shift = INCVALUE_SHIFT_24MHZ; + adapter->cc.shift = shift; + break; case e1000_pch_cnp: if (er32(TSYNCRXCTL) & E1000_TSYNCRXCTL_SYSCFI) { /* Stable 24MHz frequency */ -- 2.16.3 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* RE: [PATCH] e1000e: Ignore TSYNCRXCTL when getting I219 clock attributes 2018-05-10 7:28 ` [PATCH] e1000e: Ignore TSYNCRXCTL when getting I219 clock attributes Benjamin Poirier @ 2018-05-10 18:42 ` Keller, Jacob E 2018-05-13 6:55 ` [Intel-wired-lan] " Neftin, Sasha 2018-05-23 0:44 ` Brown, Aaron F 1 sibling, 1 reply; 6+ messages in thread From: Keller, Jacob E @ 2018-05-10 18:42 UTC (permalink / raw) To: Benjamin Poirier, Kirsher, Jeffrey T Cc: Achim Mildenberger, olouvignes@gmail.com, jayanth@goubiq.com, ehabkost@redhat.com, postmodern.mod3@gmail.com, Bart.VanAssche@wdc.com, intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org > -----Original Message----- > From: Benjamin Poirier [mailto:bpoirier@suse.com] > Sent: Thursday, May 10, 2018 12:29 AM > To: Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com> > Cc: Keller, Jacob E <jacob.e.keller@intel.com>; Achim Mildenberger > <admin@fph.physik.uni-karlsruhe.de>; olouvignes@gmail.com; > jayanth@goubiq.com; ehabkost@redhat.com; postmodern.mod3@gmail.com; > Bart.VanAssche@wdc.com; intel-wired-lan@lists.osuosl.org; > netdev@vger.kernel.org; linux-kernel@vger.kernel.org > Subject: [PATCH] e1000e: Ignore TSYNCRXCTL when getting I219 clock attributes > > There have been multiple reports of crashes that look like > kernel: RIP: 0010:[<ffffffff8110303f>] timecounter_read+0xf/0x50 > [...] > kernel: Call Trace: > kernel: [<ffffffffa0806b0f>] e1000e_phc_gettime+0x2f/0x60 [e1000e] > kernel: [<ffffffffa0806c5d>] e1000e_systim_overflow_work+0x1d/0x80 [e1000e] > kernel: [<ffffffff810992c5>] process_one_work+0x155/0x440 > kernel: [<ffffffff81099e16>] worker_thread+0x116/0x4b0 > kernel: [<ffffffff8109f422>] kthread+0xd2/0xf0 > kernel: [<ffffffff8163184f>] ret_from_fork+0x3f/0x70 > > These can be traced back to the fact that e1000e_systim_reset() skips the > timecounter_init() call if e1000e_get_base_timinca() returns -EINVAL, which > leads to a null deref in timecounter_read(). > > Commit 83129b37ef35 ("e1000e: fix systim issues", v4.2-rc1) reworked > e1000e_get_base_timinca() in such a way that it can return -EINVAL for > e1000_pch_spt if the SYSCFI bit is not set in TSYNCRXCTL. > > Some experimentation has shown that on I219 (e1000_pch_spt, "MAC: 12") > adapters, the E1000_TSYNCRXCTL_SYSCFI flag is unstable; TSYNCRXCTL reads > sometimes don't have the SYSCFI bit set. Retrying the read shortly after > finds the bit to be set. This was observed at boot (probe) but also link up > and link down. > > Moreover, the phc (PTP Hardware Clock) seems to operate normally even after > reads where SYSCFI=0. Therefore, remove this register read and > unconditionally set the clock parameters. > > Reported-by: Achim Mildenberger <admin@fph.physik.uni-karlsruhe.de> > Message-Id: <20180425065243.g5mqewg5irkwgwgv@f2> > Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1075876 > Fixes: 83129b37ef35 ("e1000e: fix systim issues") > Signed-off-by: Benjamin Poirier <bpoirier@suse.com> > --- > drivers/net/ethernet/intel/e1000e/netdev.c | 15 ++++++--------- > 1 file changed, 6 insertions(+), 9 deletions(-) > > diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c > b/drivers/net/ethernet/intel/e1000e/netdev.c > index ec4a9759a6f2..3afb1f3b6f91 100644 > --- a/drivers/net/ethernet/intel/e1000e/netdev.c > +++ b/drivers/net/ethernet/intel/e1000e/netdev.c > @@ -3546,15 +3546,12 @@ s32 e1000e_get_base_timinca(struct e1000_adapter > *adapter, u32 *timinca) > } > break; > case e1000_pch_spt: > - if (er32(TSYNCRXCTL) & E1000_TSYNCRXCTL_SYSCFI) { > - /* Stable 24MHz frequency */ > - incperiod = INCPERIOD_24MHZ; > - incvalue = INCVALUE_24MHZ; > - shift = INCVALUE_SHIFT_24MHZ; > - adapter->cc.shift = shift; > - break; > - } > - return -EINVAL; > + /* Stable 24MHz frequency */ > + incperiod = INCPERIOD_24MHZ; > + incvalue = INCVALUE_24MHZ; > + shift = INCVALUE_SHIFT_24MHZ; > + adapter->cc.shift = shift; > + break; > case e1000_pch_cnp: > if (er32(TSYNCRXCTL) & E1000_TSYNCRXCTL_SYSCFI) { > /* Stable 24MHz frequency */ > -- > 2.16.3 Given testing showing that the clock operates fine regardless of the register read, I think this is probably fine. Normally I believe the register was used to check which frequency was in use, but it doesn't seem to serve that purpose here. Thanks, Jake ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Intel-wired-lan] [PATCH] e1000e: Ignore TSYNCRXCTL when getting I219 clock attributes 2018-05-10 18:42 ` Keller, Jacob E @ 2018-05-13 6:55 ` Neftin, Sasha 0 siblings, 0 replies; 6+ messages in thread From: Neftin, Sasha @ 2018-05-13 6:55 UTC (permalink / raw) To: Keller, Jacob E, Benjamin Poirier, Kirsher, Jeffrey T Cc: ehabkost@redhat.com, netdev@vger.kernel.org, jayanth@goubiq.com, linux-kernel@vger.kernel.org, postmodern.mod3@gmail.com, Achim Mildenberger, intel-wired-lan@lists.osuosl.org, Bart.VanAssche@wdc.com, olouvignes@gmail.com On 5/10/2018 21:42, Keller, Jacob E wrote: >> -----Original Message----- >> From: Benjamin Poirier [mailto:bpoirier@suse.com] >> Sent: Thursday, May 10, 2018 12:29 AM >> To: Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com> >> Cc: Keller, Jacob E <jacob.e.keller@intel.com>; Achim Mildenberger >> <admin@fph.physik.uni-karlsruhe.de>; olouvignes@gmail.com; >> jayanth@goubiq.com; ehabkost@redhat.com; postmodern.mod3@gmail.com; >> Bart.VanAssche@wdc.com; intel-wired-lan@lists.osuosl.org; >> netdev@vger.kernel.org; linux-kernel@vger.kernel.org >> Subject: [PATCH] e1000e: Ignore TSYNCRXCTL when getting I219 clock attributes >> >> There have been multiple reports of crashes that look like >> kernel: RIP: 0010:[<ffffffff8110303f>] timecounter_read+0xf/0x50 >> [...] >> kernel: Call Trace: >> kernel: [<ffffffffa0806b0f>] e1000e_phc_gettime+0x2f/0x60 [e1000e] >> kernel: [<ffffffffa0806c5d>] e1000e_systim_overflow_work+0x1d/0x80 [e1000e] >> kernel: [<ffffffff810992c5>] process_one_work+0x155/0x440 >> kernel: [<ffffffff81099e16>] worker_thread+0x116/0x4b0 >> kernel: [<ffffffff8109f422>] kthread+0xd2/0xf0 >> kernel: [<ffffffff8163184f>] ret_from_fork+0x3f/0x70 >> >> These can be traced back to the fact that e1000e_systim_reset() skips the >> timecounter_init() call if e1000e_get_base_timinca() returns -EINVAL, which >> leads to a null deref in timecounter_read(). >> >> Commit 83129b37ef35 ("e1000e: fix systim issues", v4.2-rc1) reworked >> e1000e_get_base_timinca() in such a way that it can return -EINVAL for >> e1000_pch_spt if the SYSCFI bit is not set in TSYNCRXCTL. >> >> Some experimentation has shown that on I219 (e1000_pch_spt, "MAC: 12") >> adapters, the E1000_TSYNCRXCTL_SYSCFI flag is unstable; TSYNCRXCTL reads >> sometimes don't have the SYSCFI bit set. Retrying the read shortly after >> finds the bit to be set. This was observed at boot (probe) but also link up >> and link down. >> >> Moreover, the phc (PTP Hardware Clock) seems to operate normally even after >> reads where SYSCFI=0. Therefore, remove this register read and >> unconditionally set the clock parameters. >> >> Reported-by: Achim Mildenberger <admin@fph.physik.uni-karlsruhe.de> >> Message-Id: <20180425065243.g5mqewg5irkwgwgv@f2> >> Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1075876 >> Fixes: 83129b37ef35 ("e1000e: fix systim issues") >> Signed-off-by: Benjamin Poirier <bpoirier@suse.com> >> --- >> drivers/net/ethernet/intel/e1000e/netdev.c | 15 ++++++--------- >> 1 file changed, 6 insertions(+), 9 deletions(-) >> >> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c >> b/drivers/net/ethernet/intel/e1000e/netdev.c >> index ec4a9759a6f2..3afb1f3b6f91 100644 >> --- a/drivers/net/ethernet/intel/e1000e/netdev.c >> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c >> @@ -3546,15 +3546,12 @@ s32 e1000e_get_base_timinca(struct e1000_adapter >> *adapter, u32 *timinca) >> } >> break; >> case e1000_pch_spt: >> - if (er32(TSYNCRXCTL) & E1000_TSYNCRXCTL_SYSCFI) { >> - /* Stable 24MHz frequency */ >> - incperiod = INCPERIOD_24MHZ; >> - incvalue = INCVALUE_24MHZ; >> - shift = INCVALUE_SHIFT_24MHZ; >> - adapter->cc.shift = shift; >> - break; >> - } >> - return -EINVAL; >> + /* Stable 24MHz frequency */ >> + incperiod = INCPERIOD_24MHZ; >> + incvalue = INCVALUE_24MHZ; >> + shift = INCVALUE_SHIFT_24MHZ; >> + adapter->cc.shift = shift; >> + break; >> case e1000_pch_cnp: >> if (er32(TSYNCRXCTL) & E1000_TSYNCRXCTL_SYSCFI) { >> /* Stable 24MHz frequency */ >> -- >> 2.16.3 > > Given testing showing that the clock operates fine regardless of the register read, I think this is probably fine. Normally I believe the register was used to check which frequency was in use, but it doesn't seem to serve that purpose here. > > Thanks, > Jake > _______________________________________________ > Intel-wired-lan mailing list > Intel-wired-lan@osuosl.org > https://lists.osuosl.org/mailman/listinfo/intel-wired-lan > I've checked our specification, looks only 24MHz used for this product. Hope no different platform with another clock support has been distributed. So, let's pick up this change. ^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: [Intel-wired-lan] [PATCH] e1000e: Ignore TSYNCRXCTL when getting I219 clock attributes 2018-05-10 7:28 ` [PATCH] e1000e: Ignore TSYNCRXCTL when getting I219 clock attributes Benjamin Poirier 2018-05-10 18:42 ` Keller, Jacob E @ 2018-05-23 0:44 ` Brown, Aaron F 1 sibling, 0 replies; 6+ messages in thread From: Brown, Aaron F @ 2018-05-23 0:44 UTC (permalink / raw) To: Benjamin Poirier, Kirsher, Jeffrey T Cc: ehabkost@redhat.com, netdev@vger.kernel.org, jayanth@goubiq.com, linux-kernel@vger.kernel.org, Bart.VanAssche@wdc.com, postmodern.mod3@gmail.com, Achim Mildenberger, intel-wired-lan@lists.osuosl.org, olouvignes@gmail.com > From: Intel-wired-lan [mailto:intel-wired-lan-bounces@osuosl.org] On > Behalf Of Benjamin Poirier > Sent: Thursday, May 10, 2018 12:29 AM > To: Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com> > Cc: ehabkost@redhat.com; netdev@vger.kernel.org; jayanth@goubiq.com; > linux-kernel@vger.kernel.org; Bart.VanAssche@wdc.com; > postmodern.mod3@gmail.com; Achim Mildenberger > <admin@fph.physik.uni-karlsruhe.de>; intel-wired-lan@lists.osuosl.org; > olouvignes@gmail.com > Subject: [Intel-wired-lan] [PATCH] e1000e: Ignore TSYNCRXCTL when getting > I219 clock attributes > > There have been multiple reports of crashes that look like > kernel: RIP: 0010:[<ffffffff8110303f>] timecounter_read+0xf/0x50 > [...] > kernel: Call Trace: > kernel: [<ffffffffa0806b0f>] e1000e_phc_gettime+0x2f/0x60 [e1000e] > kernel: [<ffffffffa0806c5d>] e1000e_systim_overflow_work+0x1d/0x80 > [e1000e] > kernel: [<ffffffff810992c5>] process_one_work+0x155/0x440 > kernel: [<ffffffff81099e16>] worker_thread+0x116/0x4b0 > kernel: [<ffffffff8109f422>] kthread+0xd2/0xf0 > kernel: [<ffffffff8163184f>] ret_from_fork+0x3f/0x70 > > These can be traced back to the fact that e1000e_systim_reset() skips the > timecounter_init() call if e1000e_get_base_timinca() returns -EINVAL, which > leads to a null deref in timecounter_read(). > > Commit 83129b37ef35 ("e1000e: fix systim issues", v4.2-rc1) reworked > e1000e_get_base_timinca() in such a way that it can return -EINVAL for > e1000_pch_spt if the SYSCFI bit is not set in TSYNCRXCTL. > > Some experimentation has shown that on I219 (e1000_pch_spt, "MAC: 12") > adapters, the E1000_TSYNCRXCTL_SYSCFI flag is unstable; TSYNCRXCTL reads > sometimes don't have the SYSCFI bit set. Retrying the read shortly after > finds the bit to be set. This was observed at boot (probe) but also link up > and link down. > > Moreover, the phc (PTP Hardware Clock) seems to operate normally even > after > reads where SYSCFI=0. Therefore, remove this register read and > unconditionally set the clock parameters. > > Reported-by: Achim Mildenberger <admin@fph.physik.uni-karlsruhe.de> > Message-Id: <20180425065243.g5mqewg5irkwgwgv@f2> > Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1075876 > Fixes: 83129b37ef35 ("e1000e: fix systim issues") > Signed-off-by: Benjamin Poirier <bpoirier@suse.com> > --- > drivers/net/ethernet/intel/e1000e/netdev.c | 15 ++++++--------- > 1 file changed, 6 insertions(+), 9 deletions(-) Tested-by: Aaron Brown <aaron.f.brown@intel.com> ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2018-05-23 0:44 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-04-25 6:52 e1000e I219 timestamping oops related to TSYNCRXCTL read Benjamin Poirier 2018-04-25 19:59 ` Keller, Jacob E 2018-05-10 7:28 ` [PATCH] e1000e: Ignore TSYNCRXCTL when getting I219 clock attributes Benjamin Poirier 2018-05-10 18:42 ` Keller, Jacob E 2018-05-13 6:55 ` [Intel-wired-lan] " Neftin, Sasha 2018-05-23 0:44 ` Brown, Aaron F
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox