All of lore.kernel.org
 help / color / mirror / Atom feed
* swiotlb buffer is full
@ 2018-01-31 16:05 Ricardo Nabinger Sanchez
  2018-02-01  2:20 ` Ilia Mirkin
  0 siblings, 1 reply; 7+ messages in thread
From: Ricardo Nabinger Sanchez @ 2018-01-31 16:05 UTC (permalink / raw)
  To: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Hello,

I've noticed firefox got randomly stuck, and as sometimes that leads to a
complete system lock-up, I've checked dmesg and got this:

[Jan29 10:49] nouveau 0000:01:00.0: swiotlb buffer is full (sz: 2097152 bytes)
[  +0.000033] swiotlb: coherent allocation failed for device 0000:01:00.0 size=2097152
[  +0.000004] CPU: 6 PID: 1023 Comm: Xorg Not tainted 4.15.0-rc8 #1
[  +0.000003] Hardware name: Micro-Star International Co., Ltd. GX780/GT780/MS-1761, BIOS E1761IMS V3.01 05/02/2011
[  +0.000003] Call Trace:
[  +0.000009]  dump_stack+0x9f/0xe1
[  +0.000008]  swiotlb_alloc_coherent+0xdf/0x150
[  +0.000010]  ttm_dma_pool_get_pages+0x1ec/0x4b0
[  +0.000015]  ttm_dma_populate+0x24c/0x340
[  +0.000011]  ttm_tt_bind+0x23/0x50
[  +0.000006]  ttm_bo_handle_move_mem+0x58c/0x5c0
[  +0.000015]  ttm_bo_validate+0x152/0x190
[  +0.000004]  ? ttm_bo_init_reserved+0x3d8/0x490
[  +0.000012]  ? mutex_trylock+0xcd/0xe0
[  +0.000004]  ? ttm_bo_handle_move_mem+0x58/0x5c0
[  +0.000007]  ttm_bo_init_reserved+0x3f4/0x490
[  +0.000010]  ttm_bo_init+0x2f/0xa0
[  +0.000009]  ? nouveau_bo_invalidate_caches+0x10/0x10
[  +0.000005]  nouveau_bo_new+0x416/0x590
[  +0.000007]  ? nouveau_bo_invalidate_caches+0x10/0x10
[  +0.000009]  ? nouveau_gem_new+0x100/0x100
[  +0.000004]  nouveau_gem_new+0x49/0x100
[  +0.000009]  nouveau_gem_ioctl_new+0x41/0xc0
[  +0.000009]  drm_ioctl_kernel+0x59/0xb0
[  +0.000008]  drm_ioctl+0x2c1/0x350
[  +0.000007]  ? nouveau_gem_new+0x100/0x100
[  +0.000012]  ? _raw_spin_unlock_irqrestore+0x4d/0x90
[  +0.000006]  ? preempt_count_sub+0x9b/0xd0
[  +0.000005]  ? _raw_spin_unlock_irqrestore+0x6b/0x90
[  +0.000008]  nouveau_drm_ioctl+0x64/0xc0
[  +0.000009]  do_vfs_ioctl+0x8e/0x690
[  +0.000007]  ? __fget+0x116/0x200
[  +0.000010]  SyS_ioctl+0x74/0x80
[  +0.000009]  entry_SYSCALL_64_fastpath+0x23/0x9a
[  +0.000004] RIP: 0033:0x7f7860c70727
[  +0.000003] RSP: 002b:00007ffcb0d3b088 EFLAGS: 00000246

Uptime is about 14 days now and I don't think I've seen this trace before.

Is this useful/worth chasing?

Cheers,

-- 
Ricardo Nabinger Sanchez           http://rnsanchez.wait4.org/
        "You never learned anything by doing it right."
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 7+ messages in thread
* swiotlb buffer is full
@ 2016-02-15 12:25 Wolfgang Denk
  2016-02-16 20:13 ` Shaohua Li
  0 siblings, 1 reply; 7+ messages in thread
From: Wolfgang Denk @ 2016-02-15 12:25 UTC (permalink / raw)
  To: linux-raid

Hello,

first, I would like to apologize in advance if I should be off topic
here.  Actually I'm not really sure which kernel component causes
problems here, but as it gets triggered by a MD task I thought I start
asking here...


The problem is, that the system more or less reliably crashes with
"swiotlb buffer is full" errors when a data-check on the RAID arrays
is run.

The RAID configuration looks like this:

-> cat /proc/mdstat 
Personalities : [raid1] [raid6] [raid5] [raid4] 
md1 : active raid1 sdf3[2] sdg3[3]
      970206800 blocks super 1.2 [2/2] [UU]
      
md0 : active raid1 sdf1[0] sdg1[1]
      262132 blocks super 1.0 [2/2] [UU]
      
md3 : active raid1 sdc[1] sdb[0]
      117219728 blocks super 1.2 [2/2] [UU]
      
md2 : active raid6 sdk[5] sdi[3] sdj[4] sdh[2] sde[1] sdd[0]
      3907049792 blocks super 1.2 level 6, 16k chunk, algorithm 2 [6/6] [UUUUUU]
      
unused devices: <none>


What happens is always the same: a cron job will trigger a data-check
on the raid arrays, and then it goes like this:

Jan  3 04:00:01 castor kernel: md: data-check of RAID array md1
Jan  3 04:00:01 castor kernel: md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
Jan  3 04:00:01 castor kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check.
Jan  3 04:00:01 castor kernel: md: using 128k window, over a total of 970206800k.
Jan  3 04:00:08 castor kernel: md: delaying data-check of md0 until md1 has finished (they share one or more physical units)
Jan  3 04:00:14 castor kernel: md: data-check of RAID array md3
Jan  3 04:00:14 castor kernel: md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
Jan  3 04:00:14 castor kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check.
Jan  3 04:00:14 castor kernel: md: using 128k window, over a total of 117219728k.
Jan  3 04:00:20 castor kernel: md: data-check of RAID array md2
Jan  3 04:00:20 castor kernel: md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
Jan  3 04:00:20 castor kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check.
Jan  3 04:00:20 castor kernel: md: using 128k window, over a total of 976762448k.
Jan  3 04:23:22 castor kernel: md: md3: data-check done.
Jan  3 04:23:22 castor kernel: md: delaying data-check of md0 until md1 has finished (they share one or more physical units)
Jan  3 04:23:22 castor kernel: md: delaying data-check of md0 until md1 has finished (they share one or more physical units)
Jan  3 04:23:22 castor kernel: md: delaying data-check of md0 until md1 has finished (they share one or more physical units)
Jan  3 04:57:45 castor kernel: sata_via 0000:03:06.0: swiotlb buffer is full (sz: 16384 bytes)
Jan  3 04:57:45 castor kernel: DMA: Out of SW-IOMMU space for 16384 bytes at device 0000:03:06.0
Jan  3 04:57:45 castor kernel: ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Jan  3 04:57:45 castor kernel: ata4.00: failed command: READ DMA EXT
Jan  3 04:57:45 castor kernel: ata4.00: cmd 25/00:80:80:df:42/00:05:0d:00:00/e0 tag 0 dma 720896 in#012         res 50/00:00:7f:df:42/00:00:0d:00:00/e0 Emask 0x40 (internal error)
Jan  3 04:57:45 castor kernel: ata4.00: status: { DRDY }
Jan  3 04:57:45 castor kernel: ata4.00: configured for UDMA/133
Jan  3 04:57:45 castor kernel: ata4: EH complete
Jan  3 04:58:06 castor kernel: sata_nv 0000:00:05.0: swiotlb buffer is full (sz: 4096 bytes)
Jan  3 04:58:06 castor kernel: DMA: Out of SW-IOMMU space for 4096 bytes at device 0000:00:05.0
Jan  3 04:58:06 castor kernel: sata_via 0000:03:06.0: swiotlb buffer is full (sz: 16384 bytes)
Jan  3 04:58:06 castor kernel: DMA: Out of SW-IOMMU space for 16384 bytes at device 0000:03:06.0
Jan  3 04:58:06 castor kernel: ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Jan  3 04:58:06 castor kernel: ata4.00: failed command: READ DMA EXT
Jan  3 04:58:06 castor kernel: ata4.00: cmd 25/00:80:00:d4:5c/00:06:0d:00:00/e0 tag 0 dma 851968 in#012         res 50/00:00:ff:d3:5c/00:00:0d:00:00/e0 Emask 0x40 (internal error)
Jan  3 04:58:06 castor kernel: ata4.00: status: { DRDY }
Jan  3 04:58:06 castor kernel: ata4.00: configured for UDMA/133
Jan  3 04:58:06 castor kernel: ata15: EH in SWNCQ mode,QC:qc_active 0x10000 sactive 0x10000
Jan  3 04:58:06 castor kernel: ata15: SWNCQ:qc_active 0x0 defer_bits 0x0 last_issue_tag 0x11#012  dhfis 0x0 dmafis 0x0 sdbfis 0x0
Jan  3 04:58:06 castor kernel: ata15: ATA_REG 0x40 ERR_REG 0x0
Jan  3 04:58:06 castor kernel: ata15: tag : dhfis dmafis sdbfis sactive
Jan  3 04:58:06 castor kernel: ata15.00: exception Emask 0x0 SAct 0x10000 SErr 0x0 action 0x6
Jan  3 04:58:06 castor kernel: ata15.00: failed command: READ FPDMA QUEUED
Jan  3 04:58:06 castor kernel: ata15.00: cmd 60/00:80:f0:16:85/04:00:1d:00:00/40 tag 16 ncq 524288 in#012         res 40/00:20:f0:e6:84/00:00:1d:00:00/40 Emask 0x40 (internal error)
Jan  3 04:58:06 castor kernel: ata15.00: status: { DRDY }
Jan  3 04:58:06 castor kernel: ata15: hard resetting link
Jan  3 04:58:06 castor kernel: ata15: nv: skipping hardreset on occupied port
Jan  3 04:58:06 castor kernel: ata4: EH complete
Jan  3 04:58:06 castor kernel: ata15: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jan  3 04:58:06 castor kernel: ata15.00: configured for UDMA/133
Jan  3 04:58:06 castor kernel: ata15: EH complete
Jan  3 05:02:49 castor kernel: sata_nv 0000:00:05.0: swiotlb buffer is full (sz: 4096 bytes)
Jan  3 05:02:50 castor kernel: DMA: Out of SW-IOMMU space for 4096 bytes at device 0000:00:05.0
...

From here the system is basically dead:  it will report I/O errors on
the root file system, remount root read-only, and drop /dev/sdf3 from
RAID array /dev/md1.  I have to reboot - the root file system is
corrupted so I have to run fsck manually to repair it, and I have to
re-add /dev/sdf3 to /dev/md1 and resync.

This is _exactly_ the same in all cases.  For reference, full kernel
logs of 5 such crashes are available at [1] 

[1] https://www.amazon.com/clouddrive/share/qZjRneB0tA5TXrNqBQVzyhN6Hy8HzxpLHCKHIhfzYyk


The system has been working fine for years, up to and including
Fedora 22.  The crashes started happening after I upgraded to Fedora
23 over Xmas.  I've installed all available updates sinde, with no
avails.  Current configuration looks like this:

kernel-4.3.4-300.fc23.x86_64
mdadm-3.3.4-2.fc23.x86_64


As the first error reported is always "swiotlb buffer is full", I
tried to add "swiotlb=32768" to the kernel command line, but this
does not appear to make any effect, as I still see in the kernel
messages:

[    1.518575] software IO TLB [mem 0xdbff0000-0xdfff0000] (64MB) mapped at [ffff8800dbff0000-ffff8800dffeffff]

[I think the software IOMMU implementation is used because this is a
Dual-Core AMD Opteron Processor 2216 on a Supermicro H8DM8-2 main
board; I believe this does not support an IOMMU ?]


Has anybody any ideas what might cause such an effect?

I think it is interesting that always the same RAID array gets
kicked, and always the same disk.  I cannot see any hardware
problems, and a preventive replacement of the disk drive did not fix
the problem.

What else could I do or try?

Thanks in advance.

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
It is dangerous to be right on a subject  on  which  the  established
authorities are wrong.                                    -- Voltaire

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-02-01  9:20 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-01-31 16:05 swiotlb buffer is full Ricardo Nabinger Sanchez
2018-02-01  2:20 ` Ilia Mirkin
2018-02-01  2:25   ` [Nouveau] " Alex Deucher
2018-02-01  9:20     ` Christian König
  -- strict thread matches above, loose matches on Subject: below --
2016-02-15 12:25 Wolfgang Denk
2016-02-16 20:13 ` Shaohua Li
2016-02-18 19:44   ` Wolfgang Denk

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.