* swiotlb buffer is full
@ 2016-02-15 12:25 Wolfgang Denk
2016-02-16 20:13 ` Shaohua Li
0 siblings, 1 reply; 7+ messages in thread
From: Wolfgang Denk @ 2016-02-15 12:25 UTC (permalink / raw)
To: linux-raid
Hello,
first, I would like to apologize in advance if I should be off topic
here. Actually I'm not really sure which kernel component causes
problems here, but as it gets triggered by a MD task I thought I start
asking here...
The problem is, that the system more or less reliably crashes with
"swiotlb buffer is full" errors when a data-check on the RAID arrays
is run.
The RAID configuration looks like this:
-> cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md1 : active raid1 sdf3[2] sdg3[3]
970206800 blocks super 1.2 [2/2] [UU]
md0 : active raid1 sdf1[0] sdg1[1]
262132 blocks super 1.0 [2/2] [UU]
md3 : active raid1 sdc[1] sdb[0]
117219728 blocks super 1.2 [2/2] [UU]
md2 : active raid6 sdk[5] sdi[3] sdj[4] sdh[2] sde[1] sdd[0]
3907049792 blocks super 1.2 level 6, 16k chunk, algorithm 2 [6/6] [UUUUUU]
unused devices: <none>
What happens is always the same: a cron job will trigger a data-check
on the raid arrays, and then it goes like this:
Jan 3 04:00:01 castor kernel: md: data-check of RAID array md1
Jan 3 04:00:01 castor kernel: md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
Jan 3 04:00:01 castor kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check.
Jan 3 04:00:01 castor kernel: md: using 128k window, over a total of 970206800k.
Jan 3 04:00:08 castor kernel: md: delaying data-check of md0 until md1 has finished (they share one or more physical units)
Jan 3 04:00:14 castor kernel: md: data-check of RAID array md3
Jan 3 04:00:14 castor kernel: md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
Jan 3 04:00:14 castor kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check.
Jan 3 04:00:14 castor kernel: md: using 128k window, over a total of 117219728k.
Jan 3 04:00:20 castor kernel: md: data-check of RAID array md2
Jan 3 04:00:20 castor kernel: md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
Jan 3 04:00:20 castor kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check.
Jan 3 04:00:20 castor kernel: md: using 128k window, over a total of 976762448k.
Jan 3 04:23:22 castor kernel: md: md3: data-check done.
Jan 3 04:23:22 castor kernel: md: delaying data-check of md0 until md1 has finished (they share one or more physical units)
Jan 3 04:23:22 castor kernel: md: delaying data-check of md0 until md1 has finished (they share one or more physical units)
Jan 3 04:23:22 castor kernel: md: delaying data-check of md0 until md1 has finished (they share one or more physical units)
Jan 3 04:57:45 castor kernel: sata_via 0000:03:06.0: swiotlb buffer is full (sz: 16384 bytes)
Jan 3 04:57:45 castor kernel: DMA: Out of SW-IOMMU space for 16384 bytes at device 0000:03:06.0
Jan 3 04:57:45 castor kernel: ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Jan 3 04:57:45 castor kernel: ata4.00: failed command: READ DMA EXT
Jan 3 04:57:45 castor kernel: ata4.00: cmd 25/00:80:80:df:42/00:05:0d:00:00/e0 tag 0 dma 720896 in#012 res 50/00:00:7f:df:42/00:00:0d:00:00/e0 Emask 0x40 (internal error)
Jan 3 04:57:45 castor kernel: ata4.00: status: { DRDY }
Jan 3 04:57:45 castor kernel: ata4.00: configured for UDMA/133
Jan 3 04:57:45 castor kernel: ata4: EH complete
Jan 3 04:58:06 castor kernel: sata_nv 0000:00:05.0: swiotlb buffer is full (sz: 4096 bytes)
Jan 3 04:58:06 castor kernel: DMA: Out of SW-IOMMU space for 4096 bytes at device 0000:00:05.0
Jan 3 04:58:06 castor kernel: sata_via 0000:03:06.0: swiotlb buffer is full (sz: 16384 bytes)
Jan 3 04:58:06 castor kernel: DMA: Out of SW-IOMMU space for 16384 bytes at device 0000:03:06.0
Jan 3 04:58:06 castor kernel: ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Jan 3 04:58:06 castor kernel: ata4.00: failed command: READ DMA EXT
Jan 3 04:58:06 castor kernel: ata4.00: cmd 25/00:80:00:d4:5c/00:06:0d:00:00/e0 tag 0 dma 851968 in#012 res 50/00:00:ff:d3:5c/00:00:0d:00:00/e0 Emask 0x40 (internal error)
Jan 3 04:58:06 castor kernel: ata4.00: status: { DRDY }
Jan 3 04:58:06 castor kernel: ata4.00: configured for UDMA/133
Jan 3 04:58:06 castor kernel: ata15: EH in SWNCQ mode,QC:qc_active 0x10000 sactive 0x10000
Jan 3 04:58:06 castor kernel: ata15: SWNCQ:qc_active 0x0 defer_bits 0x0 last_issue_tag 0x11#012 dhfis 0x0 dmafis 0x0 sdbfis 0x0
Jan 3 04:58:06 castor kernel: ata15: ATA_REG 0x40 ERR_REG 0x0
Jan 3 04:58:06 castor kernel: ata15: tag : dhfis dmafis sdbfis sactive
Jan 3 04:58:06 castor kernel: ata15.00: exception Emask 0x0 SAct 0x10000 SErr 0x0 action 0x6
Jan 3 04:58:06 castor kernel: ata15.00: failed command: READ FPDMA QUEUED
Jan 3 04:58:06 castor kernel: ata15.00: cmd 60/00:80:f0:16:85/04:00:1d:00:00/40 tag 16 ncq 524288 in#012 res 40/00:20:f0:e6:84/00:00:1d:00:00/40 Emask 0x40 (internal error)
Jan 3 04:58:06 castor kernel: ata15.00: status: { DRDY }
Jan 3 04:58:06 castor kernel: ata15: hard resetting link
Jan 3 04:58:06 castor kernel: ata15: nv: skipping hardreset on occupied port
Jan 3 04:58:06 castor kernel: ata4: EH complete
Jan 3 04:58:06 castor kernel: ata15: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jan 3 04:58:06 castor kernel: ata15.00: configured for UDMA/133
Jan 3 04:58:06 castor kernel: ata15: EH complete
Jan 3 05:02:49 castor kernel: sata_nv 0000:00:05.0: swiotlb buffer is full (sz: 4096 bytes)
Jan 3 05:02:50 castor kernel: DMA: Out of SW-IOMMU space for 4096 bytes at device 0000:00:05.0
...
From here the system is basically dead: it will report I/O errors on
the root file system, remount root read-only, and drop /dev/sdf3 from
RAID array /dev/md1. I have to reboot - the root file system is
corrupted so I have to run fsck manually to repair it, and I have to
re-add /dev/sdf3 to /dev/md1 and resync.
This is _exactly_ the same in all cases. For reference, full kernel
logs of 5 such crashes are available at [1]
[1] https://www.amazon.com/clouddrive/share/qZjRneB0tA5TXrNqBQVzyhN6Hy8HzxpLHCKHIhfzYyk
The system has been working fine for years, up to and including
Fedora 22. The crashes started happening after I upgraded to Fedora
23 over Xmas. I've installed all available updates sinde, with no
avails. Current configuration looks like this:
kernel-4.3.4-300.fc23.x86_64
mdadm-3.3.4-2.fc23.x86_64
As the first error reported is always "swiotlb buffer is full", I
tried to add "swiotlb=32768" to the kernel command line, but this
does not appear to make any effect, as I still see in the kernel
messages:
[ 1.518575] software IO TLB [mem 0xdbff0000-0xdfff0000] (64MB) mapped at [ffff8800dbff0000-ffff8800dffeffff]
[I think the software IOMMU implementation is used because this is a
Dual-Core AMD Opteron Processor 2216 on a Supermicro H8DM8-2 main
board; I believe this does not support an IOMMU ?]
Has anybody any ideas what might cause such an effect?
I think it is interesting that always the same RAID array gets
kicked, and always the same disk. I cannot see any hardware
problems, and a preventive replacement of the disk drive did not fix
the problem.
What else could I do or try?
Thanks in advance.
Best regards,
Wolfgang Denk
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
It is dangerous to be right on a subject on which the established
authorities are wrong. -- Voltaire
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: swiotlb buffer is full
2016-02-15 12:25 Wolfgang Denk
@ 2016-02-16 20:13 ` Shaohua Li
2016-02-18 19:44 ` Wolfgang Denk
0 siblings, 1 reply; 7+ messages in thread
From: Shaohua Li @ 2016-02-16 20:13 UTC (permalink / raw)
To: Wolfgang Denk; +Cc: linux-raid
On Mon, Feb 15, 2016 at 01:25:39PM +0100, Wolfgang Denk wrote:
> Hello,
>
> first, I would like to apologize in advance if I should be off topic
> here. Actually I'm not really sure which kernel component causes
> problems here, but as it gets triggered by a MD task I thought I start
> asking here...
>
>
> The problem is, that the system more or less reliably crashes with
> "swiotlb buffer is full" errors when a data-check on the RAID arrays
> is run.
>
> The RAID configuration looks like this:
>
> -> cat /proc/mdstat
> Personalities : [raid1] [raid6] [raid5] [raid4]
> md1 : active raid1 sdf3[2] sdg3[3]
> 970206800 blocks super 1.2 [2/2] [UU]
>
> md0 : active raid1 sdf1[0] sdg1[1]
> 262132 blocks super 1.0 [2/2] [UU]
>
> md3 : active raid1 sdc[1] sdb[0]
> 117219728 blocks super 1.2 [2/2] [UU]
>
> md2 : active raid6 sdk[5] sdi[3] sdj[4] sdh[2] sde[1] sdd[0]
> 3907049792 blocks super 1.2 level 6, 16k chunk, algorithm 2 [6/6] [UUUUUU]
>
> unused devices: <none>
>
>
> What happens is always the same: a cron job will trigger a data-check
> on the raid arrays, and then it goes like this:
>
> Jan 3 04:00:01 castor kernel: md: data-check of RAID array md1
> Jan 3 04:00:01 castor kernel: md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
> Jan 3 04:00:01 castor kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check.
> Jan 3 04:00:01 castor kernel: md: using 128k window, over a total of 970206800k.
> Jan 3 04:00:08 castor kernel: md: delaying data-check of md0 until md1 has finished (they share one or more physical units)
> Jan 3 04:00:14 castor kernel: md: data-check of RAID array md3
> Jan 3 04:00:14 castor kernel: md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
> Jan 3 04:00:14 castor kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check.
> Jan 3 04:00:14 castor kernel: md: using 128k window, over a total of 117219728k.
> Jan 3 04:00:20 castor kernel: md: data-check of RAID array md2
> Jan 3 04:00:20 castor kernel: md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
> Jan 3 04:00:20 castor kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check.
> Jan 3 04:00:20 castor kernel: md: using 128k window, over a total of 976762448k.
> Jan 3 04:23:22 castor kernel: md: md3: data-check done.
> Jan 3 04:23:22 castor kernel: md: delaying data-check of md0 until md1 has finished (they share one or more physical units)
> Jan 3 04:23:22 castor kernel: md: delaying data-check of md0 until md1 has finished (they share one or more physical units)
> Jan 3 04:23:22 castor kernel: md: delaying data-check of md0 until md1 has finished (they share one or more physical units)
> Jan 3 04:57:45 castor kernel: sata_via 0000:03:06.0: swiotlb buffer is full (sz: 16384 bytes)
> Jan 3 04:57:45 castor kernel: DMA: Out of SW-IOMMU space for 16384 bytes at device 0000:03:06.0
> Jan 3 04:57:45 castor kernel: ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> Jan 3 04:57:45 castor kernel: ata4.00: failed command: READ DMA EXT
> Jan 3 04:57:45 castor kernel: ata4.00: cmd 25/00:80:80:df:42/00:05:0d:00:00/e0 tag 0 dma 720896 in#012 res 50/00:00:7f:df:42/00:00:0d:00:00/e0 Emask 0x40 (internal error)
> Jan 3 04:57:45 castor kernel: ata4.00: status: { DRDY }
> Jan 3 04:57:45 castor kernel: ata4.00: configured for UDMA/133
> Jan 3 04:57:45 castor kernel: ata4: EH complete
> Jan 3 04:58:06 castor kernel: sata_nv 0000:00:05.0: swiotlb buffer is full (sz: 4096 bytes)
> Jan 3 04:58:06 castor kernel: DMA: Out of SW-IOMMU space for 4096 bytes at device 0000:00:05.0
> Jan 3 04:58:06 castor kernel: sata_via 0000:03:06.0: swiotlb buffer is full (sz: 16384 bytes)
> Jan 3 04:58:06 castor kernel: DMA: Out of SW-IOMMU space for 16384 bytes at device 0000:03:06.0
> Jan 3 04:58:06 castor kernel: ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> Jan 3 04:58:06 castor kernel: ata4.00: failed command: READ DMA EXT
> Jan 3 04:58:06 castor kernel: ata4.00: cmd 25/00:80:00:d4:5c/00:06:0d:00:00/e0 tag 0 dma 851968 in#012 res 50/00:00:ff:d3:5c/00:00:0d:00:00/e0 Emask 0x40 (internal error)
> Jan 3 04:58:06 castor kernel: ata4.00: status: { DRDY }
> Jan 3 04:58:06 castor kernel: ata4.00: configured for UDMA/133
> Jan 3 04:58:06 castor kernel: ata15: EH in SWNCQ mode,QC:qc_active 0x10000 sactive 0x10000
> Jan 3 04:58:06 castor kernel: ata15: SWNCQ:qc_active 0x0 defer_bits 0x0 last_issue_tag 0x11#012 dhfis 0x0 dmafis 0x0 sdbfis 0x0
> Jan 3 04:58:06 castor kernel: ata15: ATA_REG 0x40 ERR_REG 0x0
> Jan 3 04:58:06 castor kernel: ata15: tag : dhfis dmafis sdbfis sactive
> Jan 3 04:58:06 castor kernel: ata15.00: exception Emask 0x0 SAct 0x10000 SErr 0x0 action 0x6
> Jan 3 04:58:06 castor kernel: ata15.00: failed command: READ FPDMA QUEUED
> Jan 3 04:58:06 castor kernel: ata15.00: cmd 60/00:80:f0:16:85/04:00:1d:00:00/40 tag 16 ncq 524288 in#012 res 40/00:20:f0:e6:84/00:00:1d:00:00/40 Emask 0x40 (internal error)
> Jan 3 04:58:06 castor kernel: ata15.00: status: { DRDY }
> Jan 3 04:58:06 castor kernel: ata15: hard resetting link
> Jan 3 04:58:06 castor kernel: ata15: nv: skipping hardreset on occupied port
> Jan 3 04:58:06 castor kernel: ata4: EH complete
> Jan 3 04:58:06 castor kernel: ata15: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> Jan 3 04:58:06 castor kernel: ata15.00: configured for UDMA/133
> Jan 3 04:58:06 castor kernel: ata15: EH complete
> Jan 3 05:02:49 castor kernel: sata_nv 0000:00:05.0: swiotlb buffer is full (sz: 4096 bytes)
> Jan 3 05:02:50 castor kernel: DMA: Out of SW-IOMMU space for 4096 bytes at device 0000:00:05.0
> ...
>
> From here the system is basically dead: it will report I/O errors on
> the root file system, remount root read-only, and drop /dev/sdf3 from
> RAID array /dev/md1. I have to reboot - the root file system is
> corrupted so I have to run fsck manually to repair it, and I have to
> re-add /dev/sdf3 to /dev/md1 and resync.
>
> This is _exactly_ the same in all cases. For reference, full kernel
> logs of 5 such crashes are available at [1]
>
> [1] https://www.amazon.com/clouddrive/share/qZjRneB0tA5TXrNqBQVzyhN6Hy8HzxpLHCKHIhfzYyk
>
>
> The system has been working fine for years, up to and including
> Fedora 22. The crashes started happening after I upgraded to Fedora
> 23 over Xmas. I've installed all available updates sinde, with no
> avails. Current configuration looks like this:
>
> kernel-4.3.4-300.fc23.x86_64
> mdadm-3.3.4-2.fc23.x86_64
>
>
> As the first error reported is always "swiotlb buffer is full", I
> tried to add "swiotlb=32768" to the kernel command line, but this
> does not appear to make any effect, as I still see in the kernel
> messages:
>
> [ 1.518575] software IO TLB [mem 0xdbff0000-0xdfff0000] (64MB) mapped at [ffff8800dbff0000-ffff8800dffeffff]
>
> [I think the software IOMMU implementation is used because this is a
> Dual-Core AMD Opteron Processor 2216 on a Supermicro H8DM8-2 main
> board; I believe this does not support an IOMMU ?]
>
>
> Has anybody any ideas what might cause such an effect?
>
> I think it is interesting that always the same RAID array gets
> kicked, and always the same disk. I cannot see any hardware
> problems, and a preventive replacement of the disk drive did not fix
> the problem.
this doesn't like a md problem. Probably a dma address leak in the driver. To
verify this, you can do some IO against the raw disk (sdf/sdg) and check if you
see the 'swiotlb buffer is full' issue. Did you really need iommu, eg if
iommu=off works?
Thanks,
Shaohua
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: swiotlb buffer is full
2016-02-16 20:13 ` Shaohua Li
@ 2016-02-18 19:44 ` Wolfgang Denk
0 siblings, 0 replies; 7+ messages in thread
From: Wolfgang Denk @ 2016-02-18 19:44 UTC (permalink / raw)
To: Shaohua Li; +Cc: linux-raid
Dear Shaohua Li,
In message <20160216201347.GB13119@kernel.org> you wrote:
>
> > I think it is interesting that always the same RAID array gets
> > kicked, and always the same disk. I cannot see any hardware
> > problems, and a preventive replacement of the disk drive did not fix
> > the problem.
>
> this doesn't like a md problem.
I tend to agree, but so far I have not found any other test case that
would trigger this problem.
> Probably a dma address leak in the driver. To verify this, you can
> do some IO against the raw disk (sdf/sdg) and check if you see the
> 'swiotlb buffer is full' issue.
At least sequentially reading the drive does not appear to have any
effect; I've completely read it several times with no errors.
> Did you really need iommu, eg if iommu=off works?
This is a good idea; I will enable this setting next time the server
crashed (probably next Sunday night). but then, is iommu=off not
supposed to cause a performance degradation?
Best regards,
Wolfgang Denk
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
Some people march to the beat of a different drummer. And some people
tango!
^ permalink raw reply [flat|nested] 7+ messages in thread
* swiotlb buffer is full
@ 2018-01-31 16:05 Ricardo Nabinger Sanchez
2018-02-01 2:20 ` Ilia Mirkin
0 siblings, 1 reply; 7+ messages in thread
From: Ricardo Nabinger Sanchez @ 2018-01-31 16:05 UTC (permalink / raw)
To: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
Hello,
I've noticed firefox got randomly stuck, and as sometimes that leads to a
complete system lock-up, I've checked dmesg and got this:
[Jan29 10:49] nouveau 0000:01:00.0: swiotlb buffer is full (sz: 2097152 bytes)
[ +0.000033] swiotlb: coherent allocation failed for device 0000:01:00.0 size=2097152
[ +0.000004] CPU: 6 PID: 1023 Comm: Xorg Not tainted 4.15.0-rc8 #1
[ +0.000003] Hardware name: Micro-Star International Co., Ltd. GX780/GT780/MS-1761, BIOS E1761IMS V3.01 05/02/2011
[ +0.000003] Call Trace:
[ +0.000009] dump_stack+0x9f/0xe1
[ +0.000008] swiotlb_alloc_coherent+0xdf/0x150
[ +0.000010] ttm_dma_pool_get_pages+0x1ec/0x4b0
[ +0.000015] ttm_dma_populate+0x24c/0x340
[ +0.000011] ttm_tt_bind+0x23/0x50
[ +0.000006] ttm_bo_handle_move_mem+0x58c/0x5c0
[ +0.000015] ttm_bo_validate+0x152/0x190
[ +0.000004] ? ttm_bo_init_reserved+0x3d8/0x490
[ +0.000012] ? mutex_trylock+0xcd/0xe0
[ +0.000004] ? ttm_bo_handle_move_mem+0x58/0x5c0
[ +0.000007] ttm_bo_init_reserved+0x3f4/0x490
[ +0.000010] ttm_bo_init+0x2f/0xa0
[ +0.000009] ? nouveau_bo_invalidate_caches+0x10/0x10
[ +0.000005] nouveau_bo_new+0x416/0x590
[ +0.000007] ? nouveau_bo_invalidate_caches+0x10/0x10
[ +0.000009] ? nouveau_gem_new+0x100/0x100
[ +0.000004] nouveau_gem_new+0x49/0x100
[ +0.000009] nouveau_gem_ioctl_new+0x41/0xc0
[ +0.000009] drm_ioctl_kernel+0x59/0xb0
[ +0.000008] drm_ioctl+0x2c1/0x350
[ +0.000007] ? nouveau_gem_new+0x100/0x100
[ +0.000012] ? _raw_spin_unlock_irqrestore+0x4d/0x90
[ +0.000006] ? preempt_count_sub+0x9b/0xd0
[ +0.000005] ? _raw_spin_unlock_irqrestore+0x6b/0x90
[ +0.000008] nouveau_drm_ioctl+0x64/0xc0
[ +0.000009] do_vfs_ioctl+0x8e/0x690
[ +0.000007] ? __fget+0x116/0x200
[ +0.000010] SyS_ioctl+0x74/0x80
[ +0.000009] entry_SYSCALL_64_fastpath+0x23/0x9a
[ +0.000004] RIP: 0033:0x7f7860c70727
[ +0.000003] RSP: 002b:00007ffcb0d3b088 EFLAGS: 00000246
Uptime is about 14 days now and I don't think I've seen this trace before.
Is this useful/worth chasing?
Cheers,
--
Ricardo Nabinger Sanchez http://rnsanchez.wait4.org/
"You never learned anything by doing it right."
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: swiotlb buffer is full
2018-01-31 16:05 swiotlb buffer is full Ricardo Nabinger Sanchez
@ 2018-02-01 2:20 ` Ilia Mirkin
2018-02-01 2:25 ` [Nouveau] " Alex Deucher
0 siblings, 1 reply; 7+ messages in thread
From: Ilia Mirkin @ 2018-02-01 2:20 UTC (permalink / raw)
To: Ricardo Nabinger Sanchez, Christian König; +Cc: nouveau, dri-devel
Yeah, a lot of people were getting that, as a result of some drm/ttm
hugepage usage.
Christian, did a fix ever end up going out? If so, what kernel was it
included in?
-ilia
On Wed, Jan 31, 2018 at 11:05 AM, Ricardo Nabinger Sanchez
<rnsanchez@gmail.com> wrote:
> Hello,
>
> I've noticed firefox got randomly stuck, and as sometimes that leads to a
> complete system lock-up, I've checked dmesg and got this:
>
> [Jan29 10:49] nouveau 0000:01:00.0: swiotlb buffer is full (sz: 2097152 bytes)
> [ +0.000033] swiotlb: coherent allocation failed for device 0000:01:00.0 size=2097152
> [ +0.000004] CPU: 6 PID: 1023 Comm: Xorg Not tainted 4.15.0-rc8 #1
> [ +0.000003] Hardware name: Micro-Star International Co., Ltd. GX780/GT780/MS-1761, BIOS E1761IMS V3.01 05/02/2011
> [ +0.000003] Call Trace:
> [ +0.000009] dump_stack+0x9f/0xe1
> [ +0.000008] swiotlb_alloc_coherent+0xdf/0x150
> [ +0.000010] ttm_dma_pool_get_pages+0x1ec/0x4b0
> [ +0.000015] ttm_dma_populate+0x24c/0x340
> [ +0.000011] ttm_tt_bind+0x23/0x50
> [ +0.000006] ttm_bo_handle_move_mem+0x58c/0x5c0
> [ +0.000015] ttm_bo_validate+0x152/0x190
> [ +0.000004] ? ttm_bo_init_reserved+0x3d8/0x490
> [ +0.000012] ? mutex_trylock+0xcd/0xe0
> [ +0.000004] ? ttm_bo_handle_move_mem+0x58/0x5c0
> [ +0.000007] ttm_bo_init_reserved+0x3f4/0x490
> [ +0.000010] ttm_bo_init+0x2f/0xa0
> [ +0.000009] ? nouveau_bo_invalidate_caches+0x10/0x10
> [ +0.000005] nouveau_bo_new+0x416/0x590
> [ +0.000007] ? nouveau_bo_invalidate_caches+0x10/0x10
> [ +0.000009] ? nouveau_gem_new+0x100/0x100
> [ +0.000004] nouveau_gem_new+0x49/0x100
> [ +0.000009] nouveau_gem_ioctl_new+0x41/0xc0
> [ +0.000009] drm_ioctl_kernel+0x59/0xb0
> [ +0.000008] drm_ioctl+0x2c1/0x350
> [ +0.000007] ? nouveau_gem_new+0x100/0x100
> [ +0.000012] ? _raw_spin_unlock_irqrestore+0x4d/0x90
> [ +0.000006] ? preempt_count_sub+0x9b/0xd0
> [ +0.000005] ? _raw_spin_unlock_irqrestore+0x6b/0x90
> [ +0.000008] nouveau_drm_ioctl+0x64/0xc0
> [ +0.000009] do_vfs_ioctl+0x8e/0x690
> [ +0.000007] ? __fget+0x116/0x200
> [ +0.000010] SyS_ioctl+0x74/0x80
> [ +0.000009] entry_SYSCALL_64_fastpath+0x23/0x9a
> [ +0.000004] RIP: 0033:0x7f7860c70727
> [ +0.000003] RSP: 002b:00007ffcb0d3b088 EFLAGS: 00000246
>
> Uptime is about 14 days now and I don't think I've seen this trace before.
>
> Is this useful/worth chasing?
>
> Cheers,
>
> --
> Ricardo Nabinger Sanchez http://rnsanchez.wait4.org/
> "You never learned anything by doing it right."
> _______________________________________________
> Nouveau mailing list
> Nouveau@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/nouveau
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Nouveau] swiotlb buffer is full
2018-02-01 2:20 ` Ilia Mirkin
@ 2018-02-01 2:25 ` Alex Deucher
2018-02-01 9:20 ` Christian König
0 siblings, 1 reply; 7+ messages in thread
From: Alex Deucher @ 2018-02-01 2:25 UTC (permalink / raw)
To: Ilia Mirkin
Cc: nouveau, dri-devel, Christian König,
Ricardo Nabinger Sanchez
On Wed, Jan 31, 2018 at 9:20 PM, Ilia Mirkin <imirkin@alum.mit.edu> wrote:
> Yeah, a lot of people were getting that, as a result of some drm/ttm
> hugepage usage.
>
> Christian, did a fix ever end up going out? If so, what kernel was it
> included in?
https://lkml.org/lkml/2018/1/16/106
Alex
>
> -ilia
>
> On Wed, Jan 31, 2018 at 11:05 AM, Ricardo Nabinger Sanchez
> <rnsanchez@gmail.com> wrote:
>> Hello,
>>
>> I've noticed firefox got randomly stuck, and as sometimes that leads to a
>> complete system lock-up, I've checked dmesg and got this:
>>
>> [Jan29 10:49] nouveau 0000:01:00.0: swiotlb buffer is full (sz: 2097152 bytes)
>> [ +0.000033] swiotlb: coherent allocation failed for device 0000:01:00.0 size=2097152
>> [ +0.000004] CPU: 6 PID: 1023 Comm: Xorg Not tainted 4.15.0-rc8 #1
>> [ +0.000003] Hardware name: Micro-Star International Co., Ltd. GX780/GT780/MS-1761, BIOS E1761IMS V3.01 05/02/2011
>> [ +0.000003] Call Trace:
>> [ +0.000009] dump_stack+0x9f/0xe1
>> [ +0.000008] swiotlb_alloc_coherent+0xdf/0x150
>> [ +0.000010] ttm_dma_pool_get_pages+0x1ec/0x4b0
>> [ +0.000015] ttm_dma_populate+0x24c/0x340
>> [ +0.000011] ttm_tt_bind+0x23/0x50
>> [ +0.000006] ttm_bo_handle_move_mem+0x58c/0x5c0
>> [ +0.000015] ttm_bo_validate+0x152/0x190
>> [ +0.000004] ? ttm_bo_init_reserved+0x3d8/0x490
>> [ +0.000012] ? mutex_trylock+0xcd/0xe0
>> [ +0.000004] ? ttm_bo_handle_move_mem+0x58/0x5c0
>> [ +0.000007] ttm_bo_init_reserved+0x3f4/0x490
>> [ +0.000010] ttm_bo_init+0x2f/0xa0
>> [ +0.000009] ? nouveau_bo_invalidate_caches+0x10/0x10
>> [ +0.000005] nouveau_bo_new+0x416/0x590
>> [ +0.000007] ? nouveau_bo_invalidate_caches+0x10/0x10
>> [ +0.000009] ? nouveau_gem_new+0x100/0x100
>> [ +0.000004] nouveau_gem_new+0x49/0x100
>> [ +0.000009] nouveau_gem_ioctl_new+0x41/0xc0
>> [ +0.000009] drm_ioctl_kernel+0x59/0xb0
>> [ +0.000008] drm_ioctl+0x2c1/0x350
>> [ +0.000007] ? nouveau_gem_new+0x100/0x100
>> [ +0.000012] ? _raw_spin_unlock_irqrestore+0x4d/0x90
>> [ +0.000006] ? preempt_count_sub+0x9b/0xd0
>> [ +0.000005] ? _raw_spin_unlock_irqrestore+0x6b/0x90
>> [ +0.000008] nouveau_drm_ioctl+0x64/0xc0
>> [ +0.000009] do_vfs_ioctl+0x8e/0x690
>> [ +0.000007] ? __fget+0x116/0x200
>> [ +0.000010] SyS_ioctl+0x74/0x80
>> [ +0.000009] entry_SYSCALL_64_fastpath+0x23/0x9a
>> [ +0.000004] RIP: 0033:0x7f7860c70727
>> [ +0.000003] RSP: 002b:00007ffcb0d3b088 EFLAGS: 00000246
>>
>> Uptime is about 14 days now and I don't think I've seen this trace before.
>>
>> Is this useful/worth chasing?
>>
>> Cheers,
>>
>> --
>> Ricardo Nabinger Sanchez http://rnsanchez.wait4.org/
>> "You never learned anything by doing it right."
>> _______________________________________________
>> Nouveau mailing list
>> Nouveau@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/nouveau
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Nouveau] swiotlb buffer is full
2018-02-01 2:25 ` [Nouveau] " Alex Deucher
@ 2018-02-01 9:20 ` Christian König
0 siblings, 0 replies; 7+ messages in thread
From: Christian König @ 2018-02-01 9:20 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk, Konrad Rzeszutek Wilk
Cc: nouveau, dri-devel, Ricardo Nabinger Sanchez
Hi Konrad,
just a gentle ping.
It looks like the patch "swiotlb: suppress warning when __GFP_NOWARN is
set" didn't made it into 4.15 and now people are bombarding us with bug
reports about that.
Did you already send that one out for inclusion in 4.16? It also has a
stable tag, so it should be backported to 4.15.x once it lands in Linus
tree.
Thanks in advance,
Christian.
Am 01.02.2018 um 03:25 schrieb Alex Deucher:
> On Wed, Jan 31, 2018 at 9:20 PM, Ilia Mirkin <imirkin@alum.mit.edu> wrote:
>> Yeah, a lot of people were getting that, as a result of some drm/ttm
>> hugepage usage.
>>
>> Christian, did a fix ever end up going out? If so, what kernel was it
>> included in?
> https://lkml.org/lkml/2018/1/16/106
>
> Alex
>
>> -ilia
>>
>> On Wed, Jan 31, 2018 at 11:05 AM, Ricardo Nabinger Sanchez
>> <rnsanchez@gmail.com> wrote:
>>> Hello,
>>>
>>> I've noticed firefox got randomly stuck, and as sometimes that leads to a
>>> complete system lock-up, I've checked dmesg and got this:
>>>
>>> [Jan29 10:49] nouveau 0000:01:00.0: swiotlb buffer is full (sz: 2097152 bytes)
>>> [ +0.000033] swiotlb: coherent allocation failed for device 0000:01:00.0 size=2097152
>>> [ +0.000004] CPU: 6 PID: 1023 Comm: Xorg Not tainted 4.15.0-rc8 #1
>>> [ +0.000003] Hardware name: Micro-Star International Co., Ltd. GX780/GT780/MS-1761, BIOS E1761IMS V3.01 05/02/2011
>>> [ +0.000003] Call Trace:
>>> [ +0.000009] dump_stack+0x9f/0xe1
>>> [ +0.000008] swiotlb_alloc_coherent+0xdf/0x150
>>> [ +0.000010] ttm_dma_pool_get_pages+0x1ec/0x4b0
>>> [ +0.000015] ttm_dma_populate+0x24c/0x340
>>> [ +0.000011] ttm_tt_bind+0x23/0x50
>>> [ +0.000006] ttm_bo_handle_move_mem+0x58c/0x5c0
>>> [ +0.000015] ttm_bo_validate+0x152/0x190
>>> [ +0.000004] ? ttm_bo_init_reserved+0x3d8/0x490
>>> [ +0.000012] ? mutex_trylock+0xcd/0xe0
>>> [ +0.000004] ? ttm_bo_handle_move_mem+0x58/0x5c0
>>> [ +0.000007] ttm_bo_init_reserved+0x3f4/0x490
>>> [ +0.000010] ttm_bo_init+0x2f/0xa0
>>> [ +0.000009] ? nouveau_bo_invalidate_caches+0x10/0x10
>>> [ +0.000005] nouveau_bo_new+0x416/0x590
>>> [ +0.000007] ? nouveau_bo_invalidate_caches+0x10/0x10
>>> [ +0.000009] ? nouveau_gem_new+0x100/0x100
>>> [ +0.000004] nouveau_gem_new+0x49/0x100
>>> [ +0.000009] nouveau_gem_ioctl_new+0x41/0xc0
>>> [ +0.000009] drm_ioctl_kernel+0x59/0xb0
>>> [ +0.000008] drm_ioctl+0x2c1/0x350
>>> [ +0.000007] ? nouveau_gem_new+0x100/0x100
>>> [ +0.000012] ? _raw_spin_unlock_irqrestore+0x4d/0x90
>>> [ +0.000006] ? preempt_count_sub+0x9b/0xd0
>>> [ +0.000005] ? _raw_spin_unlock_irqrestore+0x6b/0x90
>>> [ +0.000008] nouveau_drm_ioctl+0x64/0xc0
>>> [ +0.000009] do_vfs_ioctl+0x8e/0x690
>>> [ +0.000007] ? __fget+0x116/0x200
>>> [ +0.000010] SyS_ioctl+0x74/0x80
>>> [ +0.000009] entry_SYSCALL_64_fastpath+0x23/0x9a
>>> [ +0.000004] RIP: 0033:0x7f7860c70727
>>> [ +0.000003] RSP: 002b:00007ffcb0d3b088 EFLAGS: 00000246
>>>
>>> Uptime is about 14 days now and I don't think I've seen this trace before.
>>>
>>> Is this useful/worth chasing?
>>>
>>> Cheers,
>>>
>>> --
>>> Ricardo Nabinger Sanchez http://rnsanchez.wait4.org/
>>> "You never learned anything by doing it right."
>>> _______________________________________________
>>> Nouveau mailing list
>>> Nouveau@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/nouveau
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2018-02-01 9:20 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-01-31 16:05 swiotlb buffer is full Ricardo Nabinger Sanchez
2018-02-01 2:20 ` Ilia Mirkin
2018-02-01 2:25 ` [Nouveau] " Alex Deucher
2018-02-01 9:20 ` Christian König
-- strict thread matches above, loose matches on Subject: below --
2016-02-15 12:25 Wolfgang Denk
2016-02-16 20:13 ` Shaohua Li
2016-02-18 19:44 ` Wolfgang Denk
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.