linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* hvc_console change results in corrupt oops output
@ 2011-07-04 10:57 Anton Blanchard
  2011-07-04 13:56 ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 10+ messages in thread
From: Anton Blanchard @ 2011-07-04 10:57 UTC (permalink / raw)
  To: benh, brueckner, borntraeger, linuxppc-dev


Hi,

We've been struggling to debug a hang on a large ppc64 box. Every time
we collect oops output there are pieces of the oops output missing and
in some cases entire CPUs are missing.

Eventually I realised the hvc_console driver is dropping characters.
The commit that caused this is:


commit 3feebbb5492e9e463467cefb633e23a3dfcec132
Author: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Date:   Mon Oct 13 23:12:50 2008 +0000

    hvc_console: Fix loop if put_char() returns 0
    
    If put_char() routine of a hvc console backend returns 0, then the
    hvc console starts looping in the following scenarios:
    
    1. hvc_console_print()
        If put_char() returns 0 then the while loop may loop forever.
        I have added the missing check for 0 to throw away console
        messages.


The hypervisor gives us a busy return, so we could retry a number of
times instead of dropping it on the floor. We'd need to do it in the
hvc_console driver - the tty drivers share the same backend
functions so we can't hide it in the pseries put_chars function.

Anton

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: hvc_console change results in corrupt oops output
  2011-07-04 10:57 hvc_console change results in corrupt oops output Anton Blanchard
@ 2011-07-04 13:56 ` Benjamin Herrenschmidt
  2011-07-04 14:24   ` Hendrik Brueckner
  0 siblings, 1 reply; 10+ messages in thread
From: Benjamin Herrenschmidt @ 2011-07-04 13:56 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: borntraeger, brueckner, linuxppc-dev

On Mon, 2011-07-04 at 20:57 +1000, Anton Blanchard wrote:

 .../...

> The hypervisor gives us a busy return, so we could retry a number of
> times instead of dropping it on the floor. We'd need to do it in the
> hvc_console driver - the tty drivers share the same backend
> functions so we can't hide it in the pseries put_chars function.

For kernel console, I don't see why not wait forever ... If the
underlying backend really lost the connection it can always return a
negative error no ?

Or we could have "well defined" return codes for "wait forever" vs.
"drop this" ... maybe return -EAGAIN.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: hvc_console change results in corrupt oops output
  2011-07-04 13:56 ` Benjamin Herrenschmidt
@ 2011-07-04 14:24   ` Hendrik Brueckner
  2011-07-05  4:17     ` Tabi Timur-B04825
  0 siblings, 1 reply; 10+ messages in thread
From: Hendrik Brueckner @ 2011-07-04 14:24 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: borntraeger, brueckner, linuxppc-dev, Anton Blanchard

On Mon, Jul 04, 2011 at 11:56:27PM +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2011-07-04 at 20:57 +1000, Anton Blanchard wrote:
> 
>  .../...
> 
> > The hypervisor gives us a busy return, so we could retry a number of
> > times instead of dropping it on the floor. We'd need to do it in the
> > hvc_console driver - the tty drivers share the same backend
> > functions so we can't hide it in the pseries put_chars function.
> 
> For kernel console, I don't see why not wait forever ... If the
> underlying backend really lost the connection it can always return a
> negative error no ?
> 
> Or we could have "well defined" return codes for "wait forever" vs.
> "drop this" ... maybe return -EAGAIN.

I will check this again for my hvc_iucv back-end.  Meanwhile a found
an old thread discussing the same issue.  It covers some differences
between console and ttys which actually does not matter for hvc-backend
because of the shared put_chars() routine.

You can read the thread on lkml.org: http://lkml.org/lkml/2009/10/15/149


Kind regards,
  Hendrik

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: hvc_console change results in corrupt oops output
  2011-07-04 14:24   ` Hendrik Brueckner
@ 2011-07-05  4:17     ` Tabi Timur-B04825
  2011-07-05  6:22       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 10+ messages in thread
From: Tabi Timur-B04825 @ 2011-07-05  4:17 UTC (permalink / raw)
  To: Hendrik Brueckner
  Cc: linuxppc-dev@lists.ozlabs.org, Anton Blanchard,
	borntraeger@de.ibm.com

On Mon, Jul 4, 2011 at 9:24 AM, Hendrik Brueckner
<brueckner@linux.vnet.ibm.com> wrote:

> I will check this again for my hvc_iucv back-end. =A0Meanwhile a found
> an old thread discussing the same issue. =A0It covers some differences
> between console and ttys which actually does not matter for hvc-backend
> because of the shared put_chars() routine.
>
> You can read the thread on lkml.org: http://lkml.org/lkml/2009/10/15/149

I started that thread.  After much soul searching, we came to the
conclusion that HVC is not compatible with hypervisors that return
BUSY on writes.  So I threw out my HVC driver and rewrote it as a
standard console and TTY driver.  That driver is waiting to be
included in the 3.1 kernel.

--=20
Timur Tabi
Linux kernel developer at Freescale=

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: hvc_console change results in corrupt oops output
  2011-07-05  4:17     ` Tabi Timur-B04825
@ 2011-07-05  6:22       ` Benjamin Herrenschmidt
  2011-07-05 13:51         ` Tabi Timur-B04825
  2011-07-05 14:28         ` [RFC] [PATCH] hvc_console: improve tty/console put_chars handling Hendrik Brueckner
  0 siblings, 2 replies; 10+ messages in thread
From: Benjamin Herrenschmidt @ 2011-07-05  6:22 UTC (permalink / raw)
  To: Tabi Timur-B04825
  Cc: borntraeger@de.ibm.com, Hendrik Brueckner,
	linuxppc-dev@lists.ozlabs.org, Anton Blanchard

On Tue, 2011-07-05 at 04:17 +0000, Tabi Timur-B04825 wrote:
> On Mon, Jul 4, 2011 at 9:24 AM, Hendrik Brueckner
> <brueckner@linux.vnet.ibm.com> wrote:
> 
> > I will check this again for my hvc_iucv back-end.  Meanwhile a found
> > an old thread discussing the same issue.  It covers some differences
> > between console and ttys which actually does not matter for hvc-backend
> > because of the shared put_chars() routine.
> >
> > You can read the thread on lkml.org: http://lkml.org/lkml/2009/10/15/149
> 
> I started that thread.  After much soul searching, we came to the
> conclusion that HVC is not compatible with hypervisors that return
> BUSY on writes. 

That is a fun conclusion considering that hvc has been written for the
pseries hypervisor which ... can return BUSY on writes :-)

> So I threw out my HVC driver and rewrote it as a
> standard console and TTY driver.  That driver is waiting to be
> included in the 3.1 kernel.

Sucktastic.

We just need to fix HVC properly.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: hvc_console change results in corrupt oops output
  2011-07-05  6:22       ` Benjamin Herrenschmidt
@ 2011-07-05 13:51         ` Tabi Timur-B04825
  2011-07-05 14:28         ` [RFC] [PATCH] hvc_console: improve tty/console put_chars handling Hendrik Brueckner
  1 sibling, 0 replies; 10+ messages in thread
From: Tabi Timur-B04825 @ 2011-07-05 13:51 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: borntraeger@de.ibm.com, Hendrik Brueckner,
	linuxppc-dev@lists.ozlabs.org, Anton Blanchard

Benjamin Herrenschmidt wrote:

> That is a fun conclusion considering that hvc has been written for the
> pseries hypervisor which ... can return BUSY on writes :-)

Go read the original thread.  The problem is that tty writes and console=20
writes are treated the same by the hvc client driver.  If a client driver=20
detects that the hypervisor is busy, it has the choice of either spinning=20
or returning right away.  Spinning is not acceptable for tty output, so=20
all drivers return right away.  hvc then drops the unwritten characters.

According to Hendrik, this is still happening.

> We just need to fix HVC properly.

Where were you two years ago?  I complained about the problem, and even=20
posted a hackish "fix".  The response I got was tepid -- some=20
acknowledgement that the problem exists, but no real desire to fix it by=20
anyone.

So I had no choice but to abandon hvc.  And frankly, I still don't=20
understand why it exists.  Since then, I wrote a very nice console/tty=20
driver, and I have no plans to return to hvc even if the problem is fixed.

--=20
Timur Tabi
Linux kernel developer at Freescale=

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC] [PATCH] hvc_console: improve tty/console put_chars handling
  2011-07-05  6:22       ` Benjamin Herrenschmidt
  2011-07-05 13:51         ` Tabi Timur-B04825
@ 2011-07-05 14:28         ` Hendrik Brueckner
  2011-07-06  7:49           ` Anton Blanchard
                             ` (2 more replies)
  1 sibling, 3 replies; 10+ messages in thread
From: Hendrik Brueckner @ 2011-07-05 14:28 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: borntraeger@de.ibm.com, Hendrik Brueckner, Tabi Timur-B04825,
	linuxppc-dev@lists.ozlabs.org, Anton Blanchard

Hi folks,

On Tue, Jul 05, 2011 at 04:22:43PM +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2011-07-05 at 04:17 +0000, Tabi Timur-B04825 wrote:
> > On Mon, Jul 4, 2011 at 9:24 AM, Hendrik Brueckner
> > <brueckner@linux.vnet.ibm.com> wrote:
> > 
> > I started that thread.  After much soul searching, we came to the
> > conclusion that HVC is not compatible with hypervisors that return
> > BUSY on writes. 
> 
> That is a fun conclusion considering that hvc has been written for the
> pseries hypervisor which ... can return BUSY on writes :-)
> 
> We just need to fix HVC properly.

So I took initiative and looked again into this issue.  Below you can
find a patch that is based on Ben's -EAGAIN idea.  The hvc console layer
takes care of retrying depending on the backend's return code.

However, the main issue is that from a backend perspective, there is no
difference between tty and console output.  Because consoles, especially
the preferred console, behave different than a simple tty.  Blocked write
to a preferred console can stop the system.

So with the patch below, the backend can now indirectly control the way
console output is handled for it.  I still have to think if this solution
is ok or if it is better to introduce a new callback to console output only
(and might provide a default implemenatation similar to the patch below).

NOTE: I did not yet test this patch but will do.. I just want to share it
      early to get feedback from you.

-->8---------------------------------------------------------------------

Currently, the hvc_console_print() function drops console output if the
hvc backend's put_chars() returns 0.  This patch changes this behavior
to allow a retry through returning -EAGAIN.

This change also affects the hvc_push() function.  Both functions are
changed to handle -EAGAIN and to retry the put_chars() operation.

If a hvc backend returns -EAGAIN, the retry handling differs:

  - hvc_console_print() spins to write the complete console output.
  - hvc_push() behaves the same way as for returning 0.

Now hvc backends can indirectly control the way how console output is
handled through the hvc console layer.

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
---
 drivers/tty/hvc/hvc_console.c |    8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

--- a/drivers/tty/hvc/hvc_console.c
+++ b/drivers/tty/hvc/hvc_console.c
@@ -163,8 +163,10 @@ static void hvc_console_print(struct con
 		} else {
 			r = cons_ops[index]->put_chars(vtermnos[index], c, i);
 			if (r <= 0) {
-				/* throw away chars on error */
-				i = 0;
+				/* throw away characters on error
+				 * but spin in case of -EAGAIN */
+				if (r != -EAGAIN)
+					i = 0;
 			} else if (r > 0) {
 				i -= r;
 				if (i > 0)
@@ -448,7 +450,7 @@ static int hvc_push(struct hvc_struct *h
 
 	n = hp->ops->put_chars(hp->vtermno, hp->outbuf, hp->n_outbuf);
 	if (n <= 0) {
-		if (n == 0) {
+		if (n == 0 || n == -EAGAIN) {
 			hp->do_wakeup = 1;
 			return 0;
 		}

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] [PATCH] hvc_console: improve tty/console put_chars handling
  2011-07-05 14:28         ` [RFC] [PATCH] hvc_console: improve tty/console put_chars handling Hendrik Brueckner
@ 2011-07-06  7:49           ` Anton Blanchard
  2011-07-06  7:50           ` [PATCH 1/2]: " Anton Blanchard
  2011-07-06  7:51           ` [PATCH 2/2]: powerpc/pseries/hvconsole: Fix dropped console output Anton Blanchard
  2 siblings, 0 replies; 10+ messages in thread
From: Anton Blanchard @ 2011-07-06  7:49 UTC (permalink / raw)
  To: Hendrik Brueckner
  Cc: Tabi Timur-B04825, linuxppc-dev@lists.ozlabs.org,
	borntraeger@de.ibm.com


Hi Hendrik,

> So with the patch below, the backend can now indirectly control the
> way console output is handled for it.  I still have to think if this
> solution is ok or if it is better to introduce a new callback to
> console output only (and might provide a default implemenatation
> similar to the patch below).
> 
> NOTE: I did not yet test this patch but will do.. I just want to
> share it early to get feedback from you.

Thanks a lot for this. I added the pseries bits and tested it, patches
to follow.

Anton

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/2]: hvc_console: improve tty/console put_chars handling
  2011-07-05 14:28         ` [RFC] [PATCH] hvc_console: improve tty/console put_chars handling Hendrik Brueckner
  2011-07-06  7:49           ` Anton Blanchard
@ 2011-07-06  7:50           ` Anton Blanchard
  2011-07-06  7:51           ` [PATCH 2/2]: powerpc/pseries/hvconsole: Fix dropped console output Anton Blanchard
  2 siblings, 0 replies; 10+ messages in thread
From: Anton Blanchard @ 2011-07-06  7:50 UTC (permalink / raw)
  To: Hendrik Brueckner
  Cc: Tabi Timur-B04825, linuxppc-dev@lists.ozlabs.org,
	borntraeger@de.ibm.com

From: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>

Currently, the hvc_console_print() function drops console output if the
hvc backend's put_chars() returns 0.  This patch changes this behavior
to allow a retry through returning -EAGAIN.

This change also affects the hvc_push() function.  Both functions are
changed to handle -EAGAIN and to retry the put_chars() operation.

If a hvc backend returns -EAGAIN, the retry handling differs:

  - hvc_console_print() spins to write the complete console output.
  - hvc_push() behaves the same way as for returning 0.

Now hvc backends can indirectly control the way how console output is
handled through the hvc console layer.

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Acked-by: Anton Blanchard <anton@samba.org>
Cc: <stable@kernel.org>
---

Index: linux-powerpc/drivers/tty/hvc/hvc_console.c
===================================================================
--- linux-powerpc.orig/drivers/tty/hvc/hvc_console.c	2011-07-06 17:10:51.288360541 +1000
+++ linux-powerpc/drivers/tty/hvc/hvc_console.c	2011-07-06 17:11:16.398794913 +1000
@@ -163,8 +163,10 @@ static void hvc_console_print(struct con
 		} else {
 			r = cons_ops[index]->put_chars(vtermnos[index], c, i);
 			if (r <= 0) {
-				/* throw away chars on error */
-				i = 0;
+				/* throw away characters on error
+				 * but spin in case of -EAGAIN */
+				if (r != -EAGAIN)
+					i = 0;
 			} else if (r > 0) {
 				i -= r;
 				if (i > 0)
@@ -448,7 +450,7 @@ static int hvc_push(struct hvc_struct *h
 
 	n = hp->ops->put_chars(hp->vtermno, hp->outbuf, hp->n_outbuf);
 	if (n <= 0) {
-		if (n == 0) {
+		if (n == 0 || n == -EAGAIN) {
 			hp->do_wakeup = 1;
 			return 0;
 		}

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 2/2]: powerpc/pseries/hvconsole: Fix dropped console output
  2011-07-05 14:28         ` [RFC] [PATCH] hvc_console: improve tty/console put_chars handling Hendrik Brueckner
  2011-07-06  7:49           ` Anton Blanchard
  2011-07-06  7:50           ` [PATCH 1/2]: " Anton Blanchard
@ 2011-07-06  7:51           ` Anton Blanchard
  2 siblings, 0 replies; 10+ messages in thread
From: Anton Blanchard @ 2011-07-06  7:51 UTC (permalink / raw)
  To: Hendrik Brueckner
  Cc: Tabi Timur-B04825, linuxppc-dev@lists.ozlabs.org,
	borntraeger@de.ibm.com


Return -EAGAIN when we get H_BUSY back from the hypervisor. This
makes the hvc console driver retry, avoiding dropped printks.

Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: <stable@kernel.org>
---

Index: linux-powerpc/arch/powerpc/platforms/pseries/hvconsole.c
===================================================================
--- linux-powerpc.orig/arch/powerpc/platforms/pseries/hvconsole.c	2011-07-06 17:11:33.799095935 +1000
+++ linux-powerpc/arch/powerpc/platforms/pseries/hvconsole.c	2011-07-06 17:11:47.499332962 +1000
@@ -73,7 +73,7 @@ int hvc_put_chars(uint32_t vtermno, cons
 	if (ret == H_SUCCESS)
 		return count;
 	if (ret == H_BUSY)
-		return 0;
+		return -EAGAIN;
 	return -EIO;
 }
 

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2011-07-06  7:51 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-07-04 10:57 hvc_console change results in corrupt oops output Anton Blanchard
2011-07-04 13:56 ` Benjamin Herrenschmidt
2011-07-04 14:24   ` Hendrik Brueckner
2011-07-05  4:17     ` Tabi Timur-B04825
2011-07-05  6:22       ` Benjamin Herrenschmidt
2011-07-05 13:51         ` Tabi Timur-B04825
2011-07-05 14:28         ` [RFC] [PATCH] hvc_console: improve tty/console put_chars handling Hendrik Brueckner
2011-07-06  7:49           ` Anton Blanchard
2011-07-06  7:50           ` [PATCH 1/2]: " Anton Blanchard
2011-07-06  7:51           ` [PATCH 2/2]: powerpc/pseries/hvconsole: Fix dropped console output Anton Blanchard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).