public inbox for linux-ia64@vger.kernel.org
 help / color / mirror / Atom feed
* [Linux-ia64] Software IO-TLB Kernel panic
@ 2001-05-17 11:44 Martin Wilck
  2001-05-17 15:00 ` David Mosberger
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Martin Wilck @ 2001-05-17 11:44 UTC (permalink / raw)
  To: linux-ia64

Hi,

I am reproduceably getting kernel panics when accessing discs on an
Adaptec 39160 adapter (SCSI host 1 after the built-in QLA1280).
I am using the "new" aic7xxx driver on a 2.4.4 IA64 kernel.
I configured the kernel for DIG-compliant C0-stepping hardware.

The kernel panic occurs in map_single (arch/ia64/lib/swiotlb.c:171).

I have a 2-CPU Lion with C0-stepping CPUs. The requested
IO TLB size is 8192 when the panic occurs.

After the crash, I always have severe corruption on the filesystem that
was being accessed during the crash. e2fsck reports ~10 illegal blocks in
one inode, and many follow-up errors.

The only TLB related boot message I see is
kernel: Placing software IO TLB between 0xe000000000100000 - 0xe000000000300000

I am looking into this myself right now, but I would be grateful for hints
where to start. Any help appreciated,

Martin

-- 
Martin Wilck     <Martin.Wilck@fujitsu-siemens.com>
FSC EP PS DS1, Paderborn      Tel. +49 5251 8 15113





^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Linux-ia64] Software IO-TLB Kernel panic
  2001-05-17 11:44 [Linux-ia64] Software IO-TLB Kernel panic Martin Wilck
@ 2001-05-17 15:00 ` David Mosberger
  2001-05-17 17:40 ` Martin Wilck
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: David Mosberger @ 2001-05-17 15:00 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Thu, 17 May 2001 13:44:50 +0200 (CEST), Martin Wilck <Martin.Wilck@fujitsu-siemens.com> said:

  Martin> I am reproduceably getting kernel panics when accessing
  Martin> discs on an Adaptec 39160 adapter (SCSI host 1 after the
  Martin> built-in QLA1280).  I am using the "new" aic7xxx driver on a
  Martin> 2.4.4 IA64 kernel.  I configured the kernel for
  Martin> DIG-compliant C0-stepping hardware.

  Martin> The kernel panic occurs in map_single
  Martin> (arch/ia64/lib/swiotlb.c:171).

Sounds like the driver is trying to map buffers that are bigger than
(1 << IO_TLB_SHIFT) (currently 2KB).

	--david


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Linux-ia64] Software IO-TLB Kernel panic
  2001-05-17 11:44 [Linux-ia64] Software IO-TLB Kernel panic Martin Wilck
  2001-05-17 15:00 ` David Mosberger
@ 2001-05-17 17:40 ` Martin Wilck
  2001-05-17 18:49 ` David Mosberger
  2001-05-18 19:00 ` [Linux-ia64] Software IO-TLB Kernel panic - preliminary analysis Martin Wilck
  3 siblings, 0 replies; 5+ messages in thread
From: Martin Wilck @ 2001-05-17 17:40 UTC (permalink / raw)
  To: linux-ia64

>
>   Martin> The kernel panic occurs in map_single
>   Martin> (arch/ia64/lib/swiotlb.c:171).
>
> Sounds like the driver is trying to map buffers that are bigger than
> (1 << IO_TLB_SHIFT) (currently 2KB).

I looked at the code and thought that bigger buffers than 2kB can (in
principle) be created by joining subsequent buffers.
Of course, it could happen that no such contiguous space is available.
Am I wrong ?

Martin

-- 
Martin Wilck     <Martin.Wilck@fujitsu-siemens.com>
FSC EP PS DS1, Paderborn      Tel. +49 5251 8 15113





^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Linux-ia64] Software IO-TLB Kernel panic
  2001-05-17 11:44 [Linux-ia64] Software IO-TLB Kernel panic Martin Wilck
  2001-05-17 15:00 ` David Mosberger
  2001-05-17 17:40 ` Martin Wilck
@ 2001-05-17 18:49 ` David Mosberger
  2001-05-18 19:00 ` [Linux-ia64] Software IO-TLB Kernel panic - preliminary analysis Martin Wilck
  3 siblings, 0 replies; 5+ messages in thread
From: David Mosberger @ 2001-05-17 18:49 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Thu, 17 May 2001 19:40:17 +0200 (CEST), Martin Wilck <Martin.Wilck@fujitsu-siemens.com> said:

  >>
  Martin> The kernel panic occurs in map_single
  Martin> (arch/ia64/lib/swiotlb.c:171).
  >>  Sounds like the driver is trying to map buffers that are bigger
  >> than (1 << IO_TLB_SHIFT) (currently 2KB).

  Martin> I looked at the code and thought that bigger buffers than
  Martin> 2kB can (in principle) be created by joining subsequent
  Martin> buffers.  Of course, it could happen that no such contiguous
  Martin> space is available.  Am I wrong ?

Yes, you're right.  I forgot about the joining.

	--david


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Linux-ia64] Software IO-TLB Kernel panic - preliminary analysis
  2001-05-17 11:44 [Linux-ia64] Software IO-TLB Kernel panic Martin Wilck
                   ` (2 preceding siblings ...)
  2001-05-17 18:49 ` David Mosberger
@ 2001-05-18 19:00 ` Martin Wilck
  3 siblings, 0 replies; 5+ messages in thread
From: Martin Wilck @ 2001-05-18 19:00 UTC (permalink / raw)
  To: linux-ia64

This problem is really hard to hunt down, as even kdb will not
respond anymore after the crash happens. Also, my system logs are
truncated.

What I have seen, though, is that IO-TLBs are allocated very quickly
immediately before the crash. By using some printk's, I saw
133 allocations of 8192-byte chunks in a row without a single
deallocation immediately before the machine came down. This alone accounts
for about half of the bounce buffer space, without any space that
was allocated before and without any further allocations that
I may have lost due to the lost lines in the log.

By inspecting elements of the pci_dev structure passed to the routine,
I am now 99% convinced that the Adptect 7899a controller
is the "guilty" device. This fits well to the finding that the
crashes always occur after (!) a file system on that card was
activated a little more heavily.

It seems that the problem does not occur with the "old" aic7xxx
driver. On the contrary, that driver seems to deallocate every buffer
immediately after allocation.

Thus, for the time being I'd recommend to use the aic7xxx_old driver.
But it looks like a problem that ought to be solved.

Should I perhaps approach the aic7xxx maintainers?

Regards,
Martin

-- 
Martin Wilck     <Martin.Wilck@fujitsu-siemens.com>
FSC EP PS DS1, Paderborn      Tel. +49 5251 8 15113






^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2001-05-18 19:00 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-05-17 11:44 [Linux-ia64] Software IO-TLB Kernel panic Martin Wilck
2001-05-17 15:00 ` David Mosberger
2001-05-17 17:40 ` Martin Wilck
2001-05-17 18:49 ` David Mosberger
2001-05-18 19:00 ` [Linux-ia64] Software IO-TLB Kernel panic - preliminary analysis Martin Wilck

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox