From: Auke Kok <auke-jan.h.kok@intel.com>
To: Kenzo Iwami <k-iwami@cj.jp.nec.com>
Cc: Auke Kok <auke-jan.h.kok@intel.com>,
netdev@vger.kernel.org,
Jesse Brandeburg <jesse.brandeburg@intel.com>,
"Ronciak, John" <john.ronciak@intel.com>
Subject: Re: watchdog timeout panic in e1000 driver
Date: Mon, 30 Oct 2006 09:30:24 -0800 [thread overview]
Message-ID: <454636B0.1010004@intel.com> (raw)
In-Reply-To: <4545E3A4.9090004@cj.jp.nec.com>
Kenzo Iwami wrote:
> Hi,
>
> Thank you for your comment.
>
>>>>>> Anyway as I said in the same e-mail, we're working on reducing the lock timeout to a
>>>>>> reasonable time. This will unfortunately take some time, as we need to change some major
>>>>>> components in the driver to make sure this doesn't happen.
>>>>> How about the following approach?
>>>>>
>>>>> If acquiring semaphore fails inside the interrupt handler, acquiring semaphore
>>>>> is abandoned immediately without waiting for timeout.
>>>>> However, I don't know whether this method affects other processes.
>>>>
>>>> with the current hardware being accessed simultaneously from several users in the
>>>> kernel, that would lead to large problems - the watchdog task accesses it every 2
>>>> seconds as it reads the PHY link status, so when one of those fails the driver would
>>>> have no choice but to reset the entire device.
>>>
>>> This problem occurs because interrupt handler is executed while the
>>> interrupted code is still holding the semaphore. Acquiring the semaphore
>>> fails regardless of the timeout period.
>>>
>>> I think the watchdog task will fail trying to read the PHY link status,
>>> even if the lock timeout period has been reduced.
>>
>> correct, we're not looking into reducing the lock timeout but towards reducing the total
>> lock time. Once we have reduced that to something acceptable, we can reduce the timout
>> accordingly.
>
> Even if the total lock time can be reduced, it's possible that interrupt
> handler is executed while the interrupted code is still holding the semaphore.
> I think your method only decrease the frequency of this problem.
> Why does reducing the lock time solve this problem?
there are several problems here that need addressing. It's not acceptable for our driver
to wait up to 15 seconds, and we can (presumably) reduce it to milliseconds, so that
would help a lot. We should in no case at all hold it for any period longer than (give
or take) half a second, so working towards that is a very good step in the right direction.
Adding the timer task back may also help, as we are no longer trying to aqcuire the
sw_fw_semaphore in interrupt context, but we removed it for a reason, and I need to dig
up what reason this exactly was before we can revert it. Jesse might know, so I'll talk
to him. But this will not fix the fact that the semaphore is held for a long time :)
so, this needs two fixes. much to do.
Cheers,
Auke
next prev parent reply other threads:[~2006-10-30 17:31 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-10-19 10:19 watchdog timeout panic in e1000 driver Kenzo Iwami
2006-10-19 15:39 ` Auke Kok
[not found] ` <4538BFF2.2040207@cj.jp.nec.com>
2006-10-20 15:51 ` Auke Kok
2006-10-24 9:01 ` Kenzo Iwami
2006-10-24 16:15 ` Auke Kok
2006-10-25 13:41 ` Kenzo Iwami
2006-10-25 15:09 ` Auke Kok
2006-10-26 10:35 ` Kenzo Iwami
2006-10-26 14:34 ` Auke Kok
2006-10-30 11:36 ` Kenzo Iwami
2006-10-30 17:30 ` Auke Kok [this message]
2006-10-31 3:22 ` Shaw Vrana
2006-11-01 13:21 ` Kenzo Iwami
2006-11-15 10:33 ` Kenzo Iwami
2006-11-15 16:11 ` Auke Kok
2006-11-16 9:23 ` Kenzo Iwami
2007-02-20 9:26 ` Kenzo Iwami
2007-02-20 16:10 ` Auke Kok
2007-02-21 5:17 ` Kenzo Iwami
-- strict thread matches above, loose matches on Subject: below --
2006-11-16 17:20 Brandeburg, Jesse
2006-11-21 10:16 ` Kenzo Iwami
2006-12-04 9:14 ` Kenzo Iwami
2006-12-05 0:46 ` Auke Kok
2006-12-12 7:58 ` Kenzo Iwami
2006-12-19 0:13 ` Kenzo Iwami
2007-01-15 9:12 ` Kenzo Iwami
2007-01-15 16:14 ` Auke Kok
2007-01-16 8:42 ` Kenzo Iwami
2007-01-18 9:22 ` Kenzo Iwami
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=454636B0.1010004@intel.com \
--to=auke-jan.h.kok@intel.com \
--cc=jesse.brandeburg@intel.com \
--cc=john.ronciak@intel.com \
--cc=k-iwami@cj.jp.nec.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).