Hello everyone,
We're currently developing a product that relies heavily on Chromium to display content in a loop (including webgl and videos) and are running in a few stability issues. At this point we're unsure if it's webgl or video related (or both).
One thing we noticed is that there seems to be a lot of file descriptors leaking from the gpu-process. Whenever a video is playing, reloading the page / loading a new url or closing the tab seems to be leaking file descriptors according the /proc/[gpu-process-pid]/fd. However, if we wait for the video to end completely before doing anything else, the leak does not seem to happen. I've been able to reproduce the issue on our custom hardware (imx6qdl based) using Freescale Yocto 1.6, as well as on a Nitrogen6X using the community's 1.7 and 1.8. We tried with both Chromium 38.0.2125.101 and 40.0.2214.91. Galcore version is 5.0.11:25762.
The simplest way to reproduce the issue is to start Chromium with the following video:
(Note: we run Chromium in incognito mode as it seems more stable, but running as a normal session behaves the same)
Once the page is loaded, playing the video a couple of times while monitoring the number of fd in /proc/[gpu-process-pid]/fd should show that there are 3 more fds while the video is playing, and they dissapear once the video ends. For example:
(while playing)
root@imx6qh120:~# ls -al /proc/2007/fd | wc -l
63
(video ended)
root@imx6qh120:~# ls -al /proc/2007/fd | wc -l
60
If you then start the video and hit reload while the video is playing, here is what you should see (result should be similar if you close the tab instead):
(while playing)
root@imx6qh120:~# ls -al /proc/2007/fd | wc -l
70
(video ended)
root@imx6qh120:~# ls -al /proc/2007/fd | wc -l
68
Each time this happens, fds appear to be left trailing behind, looking like this:
(here, the chrome process flagged as "--type=gpu-process" is 1176)
ls –al /proc/1176/fd
[…snip]
lrwx------ 1 root root 64 Oct 8 12:03 50 -> /dev/shm/.org.chromium.Chromium.yqa9hs (deleted)
lrwx------ 1 root root 64 Oct 8 12:03 51 -> /dev/shm/.org.chromium.Chromium.rxZ7gw (deleted)
lrwx------ 1 root root 64 Oct 8 12:03 53 -> /dev/shm/.org.chromium.Chromium.wQhgYz (deleted)
lrwx------ 1 root root 64 Oct 8 12:03 54 -> /dev/shm/.org.chromium.Chromium.Sd1bnl (deleted)
lrwx------ 1 root root 64 Oct 8 12:03 55 -> /dev/shm/.org.chromium.Chromium.roolL0 (deleted)
lrwx------ 1 root root 64 Oct 8 12:03 56 -> /dev/shm/.org.chromium.Chromium.8b1eGN (deleted)
lrwx------ 1 root root 64 Oct 8 12:03 59 -> /dev/shm/.org.chromium.Chromium.k8DJ8O (deleted)
lr-x------ 1 root root 64 Oct 8 12:03 6 -> /dev/urandom
lrwx------ 1 root root 64 Oct 8 12:03 63 -> /dev/shm/.org.chromium.Chromium.dC0JSb (deleted)
lrwx------ 1 root root 64 Oct 8 12:03 64 -> /dev/shm/.org.chromium.Chromium.kvYJfm (deleted)
lrwx------ 1 root root 64 Oct 8 12:03 66 -> /dev/shm/.org.chromium.Chromium.3Uvica (deleted)
lrwx------ 1 root root 64 Oct 8 12:03 69 -> /dev/shm/.org.chromium.Chromium.n0KK3C (deleted)
lrwx------ 1 root root 64 Oct 8 12:03 7 -> socket:[9825]
lrwx------ 1 root root 64 Oct 8 12:03 70 -> /dev/shm/.org.chromium.Chromium.4frFLT (deleted)
lrwx------ 1 root root 64 Oct 8 12:03 73 -> /dev/shm/.org.chromium.Chromium.T3zH8f (deleted)
lrwx------ 1 root root 64 Oct 8 12:03 8 -> /run/user/root/weston-shared-mt7RJ3 (deleted)
lrwx------ 1 root root 64 Oct 8 12:06 84 -> socket:[45599]
lrwx------ 1 root root 64 Oct 8 12:06 88 -> socket:[47512]
lrwx------ 1 root root 64 Oct 8 12:03 9 -> /dev/fb0
lrwx------ 1 root root 64 Oct 8 12:06 92 -> /dev/shm/.org.chromium.Chromium.Vjrn54 (deleted)
lrwx------ 1 root root 64 Oct 8 12:03 77 -> /dev/shm/.org.chromium.Chromium.l7PDMZ (deleted)
[…snip…]
There are typically 2 crash outputs we frequently see. This most common one:
----------------------------------------------------------------------
[5101:5101:0930/182607:ERROR:power_save_blocker_ozone.cc(32)] Not implemented reached in virtual content::PowerSaveBlockerImpl::~PowerSaveBlockerImpl()
[8314:8314:0930/182608:ERROR:sandbox_linux.cc(301)] InitializeSandbox() called with multiple threads in process gpu-process
[5101:5101:0930/182609:ERROR:command_buffer_proxy_impl.cc(150)] Could not send GpuCommandBufferMsg_Initialize.
[5101:5101:0930/182609:ERROR:webgraphicscontext3d_command_buffer_impl.cc(213)] CommandBufferProxy::Initialize failed.
[5101:5101:0930/182609:ERROR:webgraphicscontext3d_command_buffer_impl.cc(230)] Failed to initialize command buffer.
[8258:8265:0930/182609:ERROR:webgraphicscontext3d_command_buffer_impl.cc(274)] Failed to initialize GLES2Implementation.
[5152:5177:0930/182609:ERROR:webgraphicscontext3d_command_buffer_impl.cc(274)] Failed to initialize GLES2Implementation.
[8320:8320:0930/182609:ERROR:sandbox_linux.cc(301)] InitializeSandbox() called with multiple threads in process gpu-process
[5152:5177:0930/182609:ERROR:webgraphicscontext3d_command_buffer_impl.cc(274)] Failed to initialize GLES2Implementation.
[8258:8265:0930/182609:ERROR:command_buffer_proxy_impl.cc(150)] Could not send GpuCommandBufferMsg_Initialize.
[5101:5101:0930/182609:ERROR:command_buffer_proxy_impl.cc(150)] Could not send GpuCommandBufferMsg_Initialize.
[5101:5101:0930/182609:ERROR:webgraphicscontext3d_command_buffer_impl.cc(213)] CommandBufferProxy::Initialize failed.
[5101:5101:0930/182609:ERROR:webgraphicscontext3d_command_buffer_impl.cc(230)] Failed to initialize command buffer.
[8258:8265:0930/182609:ERROR:webgraphicscontext3d_command_buffer_impl.cc(213)] CommandBufferProxy::Initialize failed.
[8258:8265:0930/182609:ERROR:webgraphicscontext3d_command_buffer_impl.cc(230)] Failed to initialize command buffer.
[5101:5101:0930/182609:ERROR:surface_factory_ozone.cc(53)] Not implemented reached in virtual scoped_ptr<ui::SurfaceOzoneCanvas> ui::SurfaceFactoryOzon
e::CreateCanvasForWidget(gfx::AcceleratedWidget)
[5101:5101:0930/182609:FATAL:software_output_device_ozone.cc(22)] Failed to initialize canvas
----------------------------------------------------------------------
And this one:
----------------------------------------------------------------------
[16856:16856:1006/110817:ERROR:power_save_blocker_ozone.cc(29)] Not implemented reached in content::PowerSaveBlockerImpl::PowerSaveBlockerImpl(content:
:PowerSaveBlocker::PowerSaveBlockerType, const string&)
[17672:17672:1006/110822:ERROR:display.cc(87)] WaylandDisplay failed to initialize hardware
[17672:17672:1006/110822:FATAL:ozone_platform_wayland.cc(106)] failed to initialize display hardware
[16856:16856:1006/110822:ERROR:surface_factory_ozone.cc(53)] Not implemented reached in virtual scoped_ptr<ui::SurfaceOzoneCanvas> ui::SurfaceFactoryOzone::CreateCanvasForWidget(gfx::AcceleratedWidget)
[16856:16856:1006/110822:FATAL:software_output_device_ozone.cc(22)] Failed to initialize canvas
----------------------------------------------------------------------
Also, unless we perform the dreaded trick of dropping caches every now and then, we get the usual:
[742:742:1006/162906:ERROR:texture_manager.cc(1706)] [.GPU-VideoAccelerator-Offscreen-0x78323380]GL ERROR :GL_OUT_OF_MEMORY : glTexImage2D:
I'm basically looking to see if anyone would know how "problematic" the file leaks can be to the system, and if that could explain why we get so many crashes while loading content. Someone here did some tracing on a debug build when Chrome crashes and it was in libvpu, but couldn't get more useful info. From what we understand, libvpu maps CMA memory for the application and we suspect that it's possibly being misused, leading to a crash.
Any pointers are appreciated.
Thanks!
--
Dominique Bureau