* [PATCH] xf86drm.c: add counter for ioctl restarting
@ 2012-04-13 13:26 Anton V. Boyarshinov
2012-04-13 13:40 ` Chris Wilson
2012-04-13 13:42 ` Daniel Vetter
0 siblings, 2 replies; 8+ messages in thread
From: Anton V. Boyarshinov @ 2012-04-13 13:26 UTC (permalink / raw)
To: DRI mailing list
In some cases ioclt->alarm->ioctl loop can be infinite:
ioctl(7, 0x40086482, 0xbfb62738) = ? ERESTARTSYS (To be restarted)
--- SIGALRM (Alarm clock) @ 0 (0) ---
sigreturn() = ? (mask now [])
ioctl(7, 0x40086482, 0xbfb62738) = ? ERESTARTSYS (To be restarted)
and forever.
It seems, that limiting ioctl restarting by some resonable number of trys
is a dirty but working way to prevent Xorg lockups.
Signed-off-by: Anton V. Boyarshinov <boyarsh@altlinux.org>
---
xf86drm.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/xf86drm.c b/xf86drm.c
index 6ea068f..9663f21 100644
--- a/xf86drm.c
+++ b/xf86drm.c
@@ -162,10 +162,11 @@ int
drmIoctl(int fd, unsigned long request, void *arg)
{
int ret;
+ int count=0;
do {
ret = ioctl(fd, request, arg);
- } while (ret == -1 && (errno == EINTR || errno == EAGAIN));
+ } while (ret == -1 && (errno == EINTR || errno == EAGAIN) && ++count < 100 );
return ret;
}
--
1.7.5.4
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] xf86drm.c: add counter for ioctl restarting
2012-04-13 13:26 [PATCH] xf86drm.c: add counter for ioctl restarting Anton V. Boyarshinov
@ 2012-04-13 13:40 ` Chris Wilson
2012-04-16 8:45 ` Anton V. Boyarshinov
2012-04-13 13:42 ` Daniel Vetter
1 sibling, 1 reply; 8+ messages in thread
From: Chris Wilson @ 2012-04-13 13:40 UTC (permalink / raw)
To: Anton V. Boyarshinov, DRI mailing list
On Fri, 13 Apr 2012 17:26:42 +0400, "Anton V. Boyarshinov" <boyarsh@altlinux.org> wrote:
> In some cases ioclt->alarm->ioctl loop can be infinite:
> ioctl(7, 0x40086482, 0xbfb62738) = ? ERESTARTSYS (To be restarted)
> --- SIGALRM (Alarm clock) @ 0 (0) ---
> sigreturn() = ? (mask now [])
> ioctl(7, 0x40086482, 0xbfb62738) = ? ERESTARTSYS (To be restarted)
> and forever.
>
> It seems, that limiting ioctl restarting by some resonable number of trys
> is a dirty but working way to prevent Xorg lockups.
And you have audited all callpaths to make sure that they can handle
EINTR? Up until now it was part of the libdrm api that it did not return
EINTR...
>From my naive pov, we could just fix the root cause of the bug rather
than escalating the bug into a random failure.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] xf86drm.c: add counter for ioctl restarting
2012-04-13 13:26 [PATCH] xf86drm.c: add counter for ioctl restarting Anton V. Boyarshinov
2012-04-13 13:40 ` Chris Wilson
@ 2012-04-13 13:42 ` Daniel Vetter
2012-04-13 14:45 ` Ville Syrjälä
2012-04-16 8:54 ` Anton V. Boyarshinov
1 sibling, 2 replies; 8+ messages in thread
From: Daniel Vetter @ 2012-04-13 13:42 UTC (permalink / raw)
To: Anton V. Boyarshinov; +Cc: DRI mailing list
On Fri, Apr 13, 2012 at 05:26:42PM +0400, Anton V. Boyarshinov wrote:
> In some cases ioclt->alarm->ioctl loop can be infinite:
> ioctl(7, 0x40086482, 0xbfb62738) = ? ERESTARTSYS (To be restarted)
> --- SIGALRM (Alarm clock) @ 0 (0) ---
> sigreturn() = ? (mask now [])
> ioctl(7, 0x40086482, 0xbfb62738) = ? ERESTARTSYS (To be restarted)
> and forever.
>
> It seems, that limiting ioctl restarting by some resonable number of trys
> is a dirty but working way to prevent Xorg lockups.
>
> Signed-off-by: Anton V. Boyarshinov <boyarsh@altlinux.org>
> ---
> xf86drm.c | 3 ++-
> 1 files changed, 2 insertions(+), 1 deletions(-)
>
> diff --git a/xf86drm.c b/xf86drm.c
> index 6ea068f..9663f21 100644
> --- a/xf86drm.c
> +++ b/xf86drm.c
> @@ -162,10 +162,11 @@ int
> drmIoctl(int fd, unsigned long request, void *arg)
> {
> int ret;
> + int count=0;
>
> do {
> ret = ioctl(fd, request, arg);
> - } while (ret == -1 && (errno == EINTR || errno == EAGAIN));
> + } while (ret == -1 && (errno == EINTR || errno == EAGAIN) && ++count < 100 );
We rely on restarting after signals when blocking for the gpu, busy gpu
plus mouse wiggling can easily reach that.
NACKed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Obviously if we have a dead gpu, we need to break out of this loop. But
detecting a dead gpu (and returning an appropriate error like EIO) is the
kernel's job.
Cheers, Daniel
> return ret;
> }
>
> --
> 1.7.5.4
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
--
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] xf86drm.c: add counter for ioctl restarting
2012-04-13 13:42 ` Daniel Vetter
@ 2012-04-13 14:45 ` Ville Syrjälä
2012-04-16 8:54 ` Anton V. Boyarshinov
1 sibling, 0 replies; 8+ messages in thread
From: Ville Syrjälä @ 2012-04-13 14:45 UTC (permalink / raw)
To: Daniel Vetter; +Cc: DRI mailing list
On Fri, Apr 13, 2012 at 03:42:16PM +0200, Daniel Vetter wrote:
> On Fri, Apr 13, 2012 at 05:26:42PM +0400, Anton V. Boyarshinov wrote:
> > In some cases ioclt->alarm->ioctl loop can be infinite:
> > ioctl(7, 0x40086482, 0xbfb62738) = ? ERESTARTSYS (To be restarted)
> > --- SIGALRM (Alarm clock) @ 0 (0) ---
> > sigreturn() = ? (mask now [])
> > ioctl(7, 0x40086482, 0xbfb62738) = ? ERESTARTSYS (To be restarted)
> > and forever.
> >
> > It seems, that limiting ioctl restarting by some resonable number of trys
> > is a dirty but working way to prevent Xorg lockups.
> >
> > Signed-off-by: Anton V. Boyarshinov <boyarsh@altlinux.org>
> > ---
> > xf86drm.c | 3 ++-
> > 1 files changed, 2 insertions(+), 1 deletions(-)
> >
> > diff --git a/xf86drm.c b/xf86drm.c
> > index 6ea068f..9663f21 100644
> > --- a/xf86drm.c
> > +++ b/xf86drm.c
> > @@ -162,10 +162,11 @@ int
> > drmIoctl(int fd, unsigned long request, void *arg)
> > {
> > int ret;
> > + int count=0;
> >
> > do {
> > ret = ioctl(fd, request, arg);
> > - } while (ret == -1 && (errno == EINTR || errno == EAGAIN));
> > + } while (ret == -1 && (errno == EINTR || errno == EAGAIN) && ++count < 100 );
>
> We rely on restarting after signals when blocking for the gpu, busy gpu
> plus mouse wiggling can easily reach that.
>
> NACKed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>
> Obviously if we have a dead gpu, we need to break out of this loop. But
> detecting a dead gpu (and returning an appropriate error like EIO) is the
> kernel's job.
The problem with EINTR is that someone else could be poking at the
device at the same time, causing the restarted ioctls to wait some
more, and again be interrupted by a signal. Or the hardware could
be itself responsible for this problem.
I ran into this issue once with a "wait for vblank" ioctl,
which had a fixed relative timeout value. So every time I would
start to wait, a signal would arrive causing the vblank to be
missed. The restarted ioctl would then start waiting using the
original timeout, allowing the signal to interrupt it again.
Any syscall which has a relative timeout should be desgigned so that
the timeout gets updated by the kernel when interrupted, so that the
originally specified timeout would really be the total timeout.
--
Ville Syrjälä
Intel OTC
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] xf86drm.c: add counter for ioctl restarting
2012-04-13 13:40 ` Chris Wilson
@ 2012-04-16 8:45 ` Anton V. Boyarshinov
0 siblings, 0 replies; 8+ messages in thread
From: Anton V. Boyarshinov @ 2012-04-16 8:45 UTC (permalink / raw)
To: Chris Wilson; +Cc: DRI mailing list
Hello
> > It seems, that limiting ioctl restarting by some resonable number of trys
> > is a dirty but working way to prevent Xorg lockups.
>
> And you have audited all callpaths to make sure that they can handle
> EINTR? Up until now it was part of the libdrm api that it did not return
> EINTR...
Speaking strictly no but this code works fine on my Nvidia ION box
with nouveau about a 4 monthes without user-visible problems. It also
have some logging code that shows that all ioctls which have been
restarted more then 5 times are restarting forewer (actually 100
times), but this logging code really unpstreamable.
> From my naive pov, we could just fix the root cause of the bug rather
> than escalating the bug into a random failure.
Yes. This is a dirty trick. But from the user scope, Xorg lockup is a
nasty thing. I have some lockups in a day a year ago, then some bugs
was fixed and a have some lockups in a week. Better, but not so good.
With this patch i have no lockups.
And, from the other side, this patch can be a tool for finding real
bugs if someone add proper logging to if (i don't know the right way to
log problems from libdrm) with ioctl, parameters and restart count.
Users will be able to add this informaition to the bugreports without
accessing by network to locked box.
Regards
Anton
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] xf86drm.c: add counter for ioctl restarting
2012-04-13 13:42 ` Daniel Vetter
2012-04-13 14:45 ` Ville Syrjälä
@ 2012-04-16 8:54 ` Anton V. Boyarshinov
2012-04-16 14:12 ` Adam Jackson
1 sibling, 1 reply; 8+ messages in thread
From: Anton V. Boyarshinov @ 2012-04-16 8:54 UTC (permalink / raw)
To: Daniel Vetter; +Cc: DRI mailing list
Hi
> Obviously if we have a dead gpu, we need to break out of this loop. But
> detecting a dead gpu (and returning an appropriate error like EIO) is the
> kernel's job.
In my case gpu isn't really dead. It works after some ioctl skip. I
understend that it is a driver bug in any case, but i thing that working
around this bugs with proper logging (absent in this patch) much better
than lockup.
Regards,
Anton
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] xf86drm.c: add counter for ioctl restarting
2012-04-16 8:54 ` Anton V. Boyarshinov
@ 2012-04-16 14:12 ` Adam Jackson
2012-04-16 15:05 ` Anton V. Boyarshinov
0 siblings, 1 reply; 8+ messages in thread
From: Adam Jackson @ 2012-04-16 14:12 UTC (permalink / raw)
To: Anton V. Boyarshinov; +Cc: DRI mailing list
[-- Attachment #1.1: Type: text/plain, Size: 629 bytes --]
On Mon, 2012-04-16 at 12:54 +0400, Anton V. Boyarshinov wrote:
> Hi
>
> > Obviously if we have a dead gpu, we need to break out of this loop. But
> > detecting a dead gpu (and returning an appropriate error like EIO) is the
> > kernel's job.
> In my case gpu isn't really dead. It works after some ioctl skip. I
> understend that it is a driver bug in any case, but i thing that working
> around this bugs with proper logging (absent in this patch) much better
> than lockup.
But your workaround isn't harmless. It adds new failure modes for other
cases that currently work. That's worse, not better.
- ajax
[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
[-- Attachment #2: Type: text/plain, Size: 159 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] xf86drm.c: add counter for ioctl restarting
2012-04-16 14:12 ` Adam Jackson
@ 2012-04-16 15:05 ` Anton V. Boyarshinov
0 siblings, 0 replies; 8+ messages in thread
From: Anton V. Boyarshinov @ 2012-04-16 15:05 UTC (permalink / raw)
To: Adam Jackson; +Cc: DRI mailing list
Hi
> > around this bugs with proper logging (absent in this patch) much better
> > than lockup.
>
> But your workaround isn't harmless. It adds new failure modes for other
> cases that currently work. That's worse, not better.
I think that from user scope nothing (including Xorg crash) is worth
then lock with 2 abilities: reset or network login and kill -9 Xorg.
I think that even if ioctl's should be restarted forever as present,
some error detecting and logging code should be in this place.
Regards,
Anton
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2012-04-16 15:05 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-04-13 13:26 [PATCH] xf86drm.c: add counter for ioctl restarting Anton V. Boyarshinov
2012-04-13 13:40 ` Chris Wilson
2012-04-16 8:45 ` Anton V. Boyarshinov
2012-04-13 13:42 ` Daniel Vetter
2012-04-13 14:45 ` Ville Syrjälä
2012-04-16 8:54 ` Anton V. Boyarshinov
2012-04-16 14:12 ` Adam Jackson
2012-04-16 15:05 ` Anton V. Boyarshinov
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.