* Re: [RFC PATCH 02/12] On Tue, 23 Sep 2008, David Miller wrote: [not found] ` <4807377b0810031628x43f79eferdbb9c9c264a5816e@mail.gmail.com> @ 2008-10-03 23:30 ` Jesse Brandeburg [not found] ` <alpine.LNX.1.10.0810041219130.26779@jikos.suse.cz> 1 sibling, 0 replies; 8+ messages in thread From: Jesse Brandeburg @ 2008-10-03 23:30 UTC (permalink / raw) To: Jiri Kosina Cc: Jesse Barnes, David Miller, jesse.brandeburg, linux-kernel, netdev, kkeil, agospoda, arjan, david.graham, bruce.w.allan, john.ronciak, Thomas Gleixner, chris.jones, tim.gardner, airlied, Olaf Kirch, Linus Torvalds On Fri, Oct 3, 2008 at 4:28 PM, Jesse Brandeburg <jesse.brandeburg@gmail.com> wrote: > On Fri, Oct 3, 2008 at 2:45 PM, Jiri Kosina <jkosina@suse.cz> wrote: >> Karsten has been testing kernel with these three patches from the series >> applied: >> >> e1000e: reset swflag after resetting hardware >> e1000e: fix lockdep issues >> e1000e: debug contention on NVM SWFLAG >> >> This was done on a hardware which previously triggered the bug in just a >> few test iterations in quite a reliable way. Now, with these patches >> applied, the EEPROM corruption didn't happen after several tens of >> iterations. >> >> Please note, that the patch that disables the writes to EEPROM on hardware >> level was *not* involved in this testing. >> >> Therefore it currently seems that these three patches really address the >> race condition issue that was present in the e1000e driver. > > Our experience is different. We are also testing with the "protection > patch" reverted. > > We see that the problem specifically comes and goes when > removing/adding the use of set_memory_ro/set_memory_rw to the driver. > > I'm working to catch the bad element in the act with a hardware > breakpoint or an ITP (we're trying both) > >> It is still not clear why the bug started triggering all of a sudden for >> so many people though. > > we plan to keep on working on this until we understand what is going on. I removed the bad addresses from the cc: list ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <alpine.LNX.1.10.0810041219130.26779@jikos.suse.cz>]
[parent not found: <alpine.LFD.2.00.0810041236250.4404@apollo>]
* Re: [RFC PATCH 02/12] On Tue, 23 Sep 2008, David Miller wrote: [not found] ` <alpine.LFD.2.00.0810041236250.4404@apollo> @ 2008-10-05 1:24 ` Jesse Brandeburg 2008-10-05 8:51 ` Thomas Gleixner 0 siblings, 1 reply; 8+ messages in thread From: Jesse Brandeburg @ 2008-10-05 1:24 UTC (permalink / raw) To: Thomas Gleixner Cc: Jiri Kosina, Jesse Barnes, David Miller, jesse.brandeburg, linux-kernel, netdev, kkeil, agospoda, arjan, david.graham, bruce.w.allan, john.ronciak, chris.jones, tim.gardner, airlied, Olaf Kirch, Linus Torvalds On Sat, Oct 4, 2008 at 4:02 AM, Thomas Gleixner <tglx@linutronix.de> wrote: > On Sat, 4 Oct 2008, Jiri Kosina wrote: >> On Fri, 3 Oct 2008, Jesse Brandeburg wrote: >> > Our experience is different. We are also testing with the "protection >> > patch" reverted. >> > We see that the problem specifically comes and goes when >> > removing/adding the use of set_memory_ro/set_memory_rw to the driver. >> >> But if this patch (which is an obvious workaround, compared to the other >> patches which fix real bugs, right?) would be catching some malicious >> accessess to the mapped EEPROM, there should be stacktraces present in the >> kernel log, right? yes, but I think it is just changing timing, i don't see any backtraces either. > Exactly. The access to a ro region results in a fault. I have nowhere > seen that trigger, but I can reproduce the trylock() WARN_ON, which > confirms that there is concurrent access to the NVRAM registers. The > backtrace pattern is similar to the one you have seen. are you still getting WARN_ON *with* all the mutex based fixes already applied? with the mutex patches in place (without protection patch) we are still reproducing the issue, until we apply the set_memory_ro patch. I had no luck on friday setting a hardware breakpoint on memory access with kgdb to catch the writer with a breakpoint. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH 02/12] On Tue, 23 Sep 2008, David Miller wrote: 2008-10-05 1:24 ` Jesse Brandeburg @ 2008-10-05 8:51 ` Thomas Gleixner 2008-10-05 15:05 ` Arjan van de Ven 0 siblings, 1 reply; 8+ messages in thread From: Thomas Gleixner @ 2008-10-05 8:51 UTC (permalink / raw) To: Jesse Brandeburg Cc: Jiri Kosina, Jesse Barnes, David Miller, jesse.brandeburg, linux-kernel, netdev, kkeil, agospoda, arjan, david.graham, bruce.w.allan, john.ronciak, chris.jones, tim.gardner, airlied, Olaf Kirch, Linus Torvalds On Sat, 4 Oct 2008, Jesse Brandeburg wrote: > > Exactly. The access to a ro region results in a fault. I have nowhere > > seen that trigger, but I can reproduce the trylock() WARN_ON, which > > confirms that there is concurrent access to the NVRAM registers. The > > backtrace pattern is similar to the one you have seen. > > are you still getting WARN_ON *with* all the mutex based fixes already applied? The WARN_ON triggers with current mainline. Is there any fixlet in Linus tree missing ? > with the mutex patches in place (without protection patch) we are > still reproducing the issue, until we apply the set_memory_ro patch. That does not make sense to me. If the memory_ro patch is providing _real_ protection then you _must_ run into an access violation. If not, then the patch just papers over the real problem in some mysterious way. The patch does: + set_memory_rw((unsigned long)hw->flash_address, + hw->flash_len >> PAGE_SHIFT); writew(val, hw->flash_address + reg); + set_memory_ro((unsigned long)hw->flash_address, + hw->flash_len >> PAGE_SHIFT); This changes massively the timing of the flash access. Could this be the problem on the machine which needs the set_memory_ro patch to survive ? > I had no luck on friday setting a hardware breakpoint on memory access > with kgdb to catch the writer with a breakpoint. Well, why should you get a hardware breakpoint when the _ro protection does not trigger in the first place ? Granted there could be a _rw alias mapping, but then the problem must be still visible with the _ro patch applied. Thanks, tglx ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH 02/12] On Tue, 23 Sep 2008, David Miller wrote: 2008-10-05 8:51 ` Thomas Gleixner @ 2008-10-05 15:05 ` Arjan van de Ven 2008-10-05 15:55 ` Thomas Gleixner 0 siblings, 1 reply; 8+ messages in thread From: Arjan van de Ven @ 2008-10-05 15:05 UTC (permalink / raw) To: Thomas Gleixner Cc: Jesse Brandeburg, Jiri Kosina, Jesse Barnes, David Miller, jesse.brandeburg, linux-kernel, netdev, kkeil, agospoda, david.graham, bruce.w.allan, john.ronciak, chris.jones, tim.gardner, airlied, Olaf Kirch, Linus Torvalds Thomas Gleixner wrote: > On Sat, 4 Oct 2008, Jesse Brandeburg wrote: >>> Exactly. The access to a ro region results in a fault. I have nowhere >>> seen that trigger, but I can reproduce the trylock() WARN_ON, which >>> confirms that there is concurrent access to the NVRAM registers. The >>> backtrace pattern is similar to the one you have seen. >> are you still getting WARN_ON *with* all the mutex based fixes already applied? > > The WARN_ON triggers with current mainline. Is there any fixlet in > Linus tree missing ? > >> with the mutex patches in place (without protection patch) we are >> still reproducing the issue, until we apply the set_memory_ro patch. > > That does not make sense to me. If the memory_ro patch is providing > _real_ protection then you _must_ run into an access violation. If not, > then the patch just papers over the real problem in some mysterious > way. > not if the bad code is doing copy_to_user .... (or similar) ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH 02/12] On Tue, 23 Sep 2008, David Miller wrote: 2008-10-05 15:05 ` Arjan van de Ven @ 2008-10-05 15:55 ` Thomas Gleixner 2008-10-05 16:02 ` Arjan van de Ven 0 siblings, 1 reply; 8+ messages in thread From: Thomas Gleixner @ 2008-10-05 15:55 UTC (permalink / raw) To: Arjan van de Ven Cc: Jesse Brandeburg, Jiri Kosina, Jesse Barnes, David Miller, jesse.brandeburg, linux-kernel, netdev, kkeil, agospoda, david.graham, bruce.w.allan, john.ronciak, chris.jones, tim.gardner, airlied, Olaf Kirch, Linus Torvalds On Sun, 5 Oct 2008, Arjan van de Ven wrote: > Thomas Gleixner wrote: > > On Sat, 4 Oct 2008, Jesse Brandeburg wrote: > > > > Exactly. The access to a ro region results in a fault. I have nowhere > > > > seen that trigger, but I can reproduce the trylock() WARN_ON, which > > > > confirms that there is concurrent access to the NVRAM registers. The > > > > backtrace pattern is similar to the one you have seen. > > > are you still getting WARN_ON *with* all the mutex based fixes already > > > applied? > > > > The WARN_ON triggers with current mainline. Is there any fixlet in > > Linus tree missing ? > > > > > with the mutex patches in place (without protection patch) we are > > > still reproducing the issue, until we apply the set_memory_ro patch. > > > > That does not make sense to me. If the memory_ro patch is providing > > _real_ protection then you _must_ run into an access violation. If not, > > then the patch just papers over the real problem in some mysterious > > way. > > > > not if the bad code is doing copy_to_user .... (or similar) You mean: copy_from_user :) This would require that the e1000e nvram region is writable via copy_from_user by an e1000e user space interface. A quick grep does not reviel such a horrible interface. Thanks, tglx ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH 02/12] On Tue, 23 Sep 2008, David Miller wrote: 2008-10-05 15:55 ` Thomas Gleixner @ 2008-10-05 16:02 ` Arjan van de Ven 2008-10-05 16:16 ` Thomas Gleixner 0 siblings, 1 reply; 8+ messages in thread From: Arjan van de Ven @ 2008-10-05 16:02 UTC (permalink / raw) To: Thomas Gleixner Cc: Jesse Brandeburg, Jiri Kosina, Jesse Barnes, David Miller, jesse.brandeburg, linux-kernel, netdev, kkeil, agospoda, david.graham, bruce.w.allan, john.ronciak, chris.jones, tim.gardner, airlied, Olaf Kirch, Linus Torvalds On Sun, 5 Oct 2008 17:55:14 +0200 (CEST) Thomas Gleixner <tglx@linutronix.de> wrote: > On Sun, 5 Oct 2008, Arjan van de Ven wrote: > > Thomas Gleixner wrote: > > > On Sat, 4 Oct 2008, Jesse Brandeburg wrote: > > > > > Exactly. The access to a ro region results in a fault. I have > > > > > nowhere seen that trigger, but I can reproduce the trylock() > > > > > WARN_ON, which confirms that there is concurrent access to > > > > > the NVRAM registers. The backtrace pattern is similar to the > > > > > one you have seen. > > > > are you still getting WARN_ON *with* all the mutex based fixes > > > > already applied? > > > > > > The WARN_ON triggers with current mainline. Is there any fixlet in > > > Linus tree missing ? > > > > > > > with the mutex patches in place (without protection patch) we > > > > are still reproducing the issue, until we apply the > > > > set_memory_ro patch. > > > > > > That does not make sense to me. If the memory_ro patch is > > > providing _real_ protection then you _must_ run into an access > > > violation. If not, then the patch just papers over the real > > > problem in some mysterious way. > > > > > > > not if the bad code is doing copy_to_user .... (or similar) > > You mean: copy_from_user :) This would require that the e1000e > nvram region is writable via copy_from_user by an e1000e user space > interface. A quick grep does not reviel such a horrible interface. I meant a "copy_to_user" to a duff pointer, somewhere in the kernel. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH 02/12] On Tue, 23 Sep 2008, David Miller wrote: 2008-10-05 16:02 ` Arjan van de Ven @ 2008-10-05 16:16 ` Thomas Gleixner 2008-10-05 17:01 ` Arjan van de Ven 0 siblings, 1 reply; 8+ messages in thread From: Thomas Gleixner @ 2008-10-05 16:16 UTC (permalink / raw) To: Arjan van de Ven Cc: Jesse Brandeburg, Jiri Kosina, Jesse Barnes, David Miller, jesse.brandeburg, linux-kernel, netdev, kkeil, agospoda, david.graham, bruce.w.allan, john.ronciak, chris.jones, tim.gardner, airlied, Olaf Kirch, Linus Torvalds On Sun, 5 Oct 2008, Arjan van de Ven wrote: > On Sun, 5 Oct 2008 17:55:14 +0200 (CEST) > Thomas Gleixner <tglx@linutronix.de> wrote: > > > On Sun, 5 Oct 2008, Arjan van de Ven wrote: > > > Thomas Gleixner wrote: > > > > On Sat, 4 Oct 2008, Jesse Brandeburg wrote: > > > > > > Exactly. The access to a ro region results in a fault. I have > > > > > > nowhere seen that trigger, but I can reproduce the trylock() > > > > > > WARN_ON, which confirms that there is concurrent access to > > > > > > the NVRAM registers. The backtrace pattern is similar to the > > > > > > one you have seen. > > > > > are you still getting WARN_ON *with* all the mutex based fixes > > > > > already applied? > > > > > > > > The WARN_ON triggers with current mainline. Is there any fixlet in > > > > Linus tree missing ? > > > > > > > > > with the mutex patches in place (without protection patch) we > > > > > are still reproducing the issue, until we apply the > > > > > set_memory_ro patch. > > > > > > > > That does not make sense to me. If the memory_ro patch is > > > > providing _real_ protection then you _must_ run into an access > > > > violation. If not, then the patch just papers over the real > > > > problem in some mysterious way. > > > > > > > > > > not if the bad code is doing copy_to_user .... (or similar) > > > > You mean: copy_from_user :) This would require that the e1000e > > nvram region is writable via copy_from_user by an e1000e user space > > interface. A quick grep does not reviel such a horrible interface. > > I meant a "copy_to_user" to a duff pointer, somewhere in the kernel. Hmm, don't we check the *to address on copy_to_user ? Thanks, tglx ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH 02/12] On Tue, 23 Sep 2008, David Miller wrote: 2008-10-05 16:16 ` Thomas Gleixner @ 2008-10-05 17:01 ` Arjan van de Ven 0 siblings, 0 replies; 8+ messages in thread From: Arjan van de Ven @ 2008-10-05 17:01 UTC (permalink / raw) To: Thomas Gleixner Cc: Jesse Brandeburg, Jiri Kosina, Jesse Barnes, David Miller, jesse.brandeburg, linux-kernel, netdev, kkeil, agospoda, david.graham, bruce.w.allan, john.ronciak, chris.jones, tim.gardner, airlied, Olaf Kirch, Linus Torvalds On Sun, 5 Oct 2008 18:16:29 +0200 (CEST) Thomas Gleixner <tglx@linutronix.de> wrote: > On Sun, 5 Oct 2008, Arjan van de Ven wrote: > > On Sun, 5 Oct 2008 17:55:14 +0200 (CEST) > > Thomas Gleixner <tglx@linutronix.de> wrote: > > > > > On Sun, 5 Oct 2008, Arjan van de Ven wrote: > > > > Thomas Gleixner wrote: > > > > > On Sat, 4 Oct 2008, Jesse Brandeburg wrote: > > > > > > > Exactly. The access to a ro region results in a fault. I > > > > > > > have nowhere seen that trigger, but I can reproduce the > > > > > > > trylock() WARN_ON, which confirms that there is > > > > > > > concurrent access to the NVRAM registers. The backtrace > > > > > > > pattern is similar to the one you have seen. > > > > > > are you still getting WARN_ON *with* all the mutex based > > > > > > fixes already applied? > > > > > > > > > > The WARN_ON triggers with current mainline. Is there any > > > > > fixlet in Linus tree missing ? > > > > > > > > > > > with the mutex patches in place (without protection patch) > > > > > > we are still reproducing the issue, until we apply the > > > > > > set_memory_ro patch. > > > > > > > > > > That does not make sense to me. If the memory_ro patch is > > > > > providing _real_ protection then you _must_ run into an access > > > > > violation. If not, then the patch just papers over the real > > > > > problem in some mysterious way. > > > > > > > > > > > > > not if the bad code is doing copy_to_user .... (or similar) > > > > > > You mean: copy_from_user :) This would require that the e1000e > > > nvram region is writable via copy_from_user by an e1000e user > > > space interface. A quick grep does not reviel such a horrible > > > interface. > > > > I meant a "copy_to_user" to a duff pointer, somewhere in the kernel. > > Hmm, don't we check the *to address on copy_to_user ? > fair point and we do exception catching for copy_from_user as well on the source, just by how it's implemented -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2008-10-05 17:01 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20080930030825.22950.18891.stgit@jbrandeb-bw.jf.intel.com>
[not found] ` <200810021523.45884.jbarnes@virtuousgeek.org>
[not found] ` <20081003.134634.240211201.davem@davemloft.net>
[not found] ` <200810031429.22598.jbarnes@virtuousgeek.org>
[not found] ` <alpine.LNX.1.10.0810032338140.26779@jikos.suse.cz>
[not found] ` <4807377b0810031628x43f79eferdbb9c9c264a5816e@mail.gmail.com>
2008-10-03 23:30 ` [RFC PATCH 02/12] On Tue, 23 Sep 2008, David Miller wrote: Jesse Brandeburg
[not found] ` <alpine.LNX.1.10.0810041219130.26779@jikos.suse.cz>
[not found] ` <alpine.LFD.2.00.0810041236250.4404@apollo>
2008-10-05 1:24 ` Jesse Brandeburg
2008-10-05 8:51 ` Thomas Gleixner
2008-10-05 15:05 ` Arjan van de Ven
2008-10-05 15:55 ` Thomas Gleixner
2008-10-05 16:02 ` Arjan van de Ven
2008-10-05 16:16 ` Thomas Gleixner
2008-10-05 17:01 ` Arjan van de Ven
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).