* [patch] Altix BTE error handling fix
@ 2005-01-10 23:33 Russ Anderson
2005-01-11 12:03 ` Robin Holt
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: Russ Anderson @ 2005-01-10 23:33 UTC (permalink / raw)
To: linux-ia64
This patch fixes off-node BTE error handling.
Signed-off-by: Russ Anderson <rja@sgi.com>
------------------------------------------------------------------
Index: 2.6.11/arch/ia64/sn/kernel/bte_error.c
=================================--- 2.6.11.orig/arch/ia64/sn/kernel/bte_error.c 2005-01-10 15:39:36.813394456 -0600
+++ 2.6.11/arch/ia64/sn/kernel/bte_error.c 2005-01-10 16:06:53.244516756 -0600
@@ -47,6 +47,7 @@
ii_icrb0_d_u_t icrbd; /* II CRB Register D */
ii_ibcr_u_t ibcr;
ii_icmr_u_t icmr;
+ ii_ieclr_u_t ieclr;
BTE_PRINTK(("bte_error_handler(%p) - %d\n", err_nodepda,
smp_processor_id()));
@@ -131,6 +132,14 @@
imem.ii_imem_fld_s.i_b0_esd = imem.ii_imem_fld_s.i_b1_esd = 1;
REMOTE_HUB_S(nasid, IIO_IMEM, imem.ii_imem_regval);
+ /* Clear BTE0/1 error bits */
+ ieclr.ii_ieclr_regval = 0;
+ if (err_nodepda->bte_if[0].bh_error != BTE_SUCCESS)
+ ieclr.ii_ieclr_fld_s.i_e_bte_0 = 1;
+ if (err_nodepda->bte_if[1].bh_error != BTE_SUCCESS)
+ ieclr.ii_ieclr_fld_s.i_e_bte_1 = 1;
+ REMOTE_HUB_S(nasid, IIO_IECLR, ieclr.ii_ieclr_regval);
+
/* Reinitialize both BTE state machines. */
ibcr.ii_ibcr_regval = REMOTE_HUB_L(nasid, IIO_IBCR);
ibcr.ii_ibcr_fld_s.i_soft_reset = 1;
@@ -152,7 +161,7 @@
err_nodepda->bte_if[i].cleanup_active = 0;
BTE_PRINTK(("eh:%p:%d Unlocked %d\n", err_nodepda,
smp_processor_id(), i));
- spin_unlock(&pda->cpu_bte_if[i]->spinlock);
+ spin_unlock(&err_nodepda->bte_if[i].spinlock);
}
del_timer(recovery_timer);
--
Russ Anderson, OS RAS/Partitioning Project Lead
SGI - Silicon Graphics Inc rja@sgi.com
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [patch] Altix BTE error handling fix
2005-01-10 23:33 [patch] Altix BTE error handling fix Russ Anderson
@ 2005-01-11 12:03 ` Robin Holt
2005-01-11 12:12 ` Robin Holt
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Robin Holt @ 2005-01-11 12:03 UTC (permalink / raw)
To: linux-ia64
On Mon, Jan 10, 2005 at 05:33:19PM -0600, Russ Anderson wrote:
> This patch fixes off-node BTE error handling.
>
> Signed-off-by: Russ Anderson <rja@sgi.com>
>
> ------------------------------------------------------------------
> Index: 2.6.11/arch/ia64/sn/kernel/bte_error.c
> =================================> --- 2.6.11.orig/arch/ia64/sn/kernel/bte_error.c 2005-01-10 15:39:36.813394456 -0600
> +++ 2.6.11/arch/ia64/sn/kernel/bte_error.c 2005-01-10 16:06:53.244516756 -0600
> @@ -47,6 +47,7 @@
> ii_icrb0_d_u_t icrbd; /* II CRB Register D */
> ii_ibcr_u_t ibcr;
> ii_icmr_u_t icmr;
> + ii_ieclr_u_t ieclr;
>
> BTE_PRINTK(("bte_error_handler(%p) - %d\n", err_nodepda,
> smp_processor_id()));
> @@ -131,6 +132,14 @@
> imem.ii_imem_fld_s.i_b0_esd = imem.ii_imem_fld_s.i_b1_esd = 1;
> REMOTE_HUB_S(nasid, IIO_IMEM, imem.ii_imem_regval);
>
> + /* Clear BTE0/1 error bits */
> + ieclr.ii_ieclr_regval = 0;
> + if (err_nodepda->bte_if[0].bh_error != BTE_SUCCESS)
> + ieclr.ii_ieclr_fld_s.i_e_bte_0 = 1;
> + if (err_nodepda->bte_if[1].bh_error != BTE_SUCCESS)
> + ieclr.ii_ieclr_fld_s.i_e_bte_1 = 1;
> + REMOTE_HUB_S(nasid, IIO_IECLR, ieclr.ii_ieclr_regval);
> +
Hardware will clear those when you reintialize the state machines.
> /* Reinitialize both BTE state machines. */
> ibcr.ii_ibcr_regval = REMOTE_HUB_L(nasid, IIO_IBCR);
> ibcr.ii_ibcr_fld_s.i_soft_reset = 1;
> @@ -152,7 +161,7 @@
> err_nodepda->bte_if[i].cleanup_active = 0;
> BTE_PRINTK(("eh:%p:%d Unlocked %d\n", err_nodepda,
> smp_processor_id(), i));
> - spin_unlock(&pda->cpu_bte_if[i]->spinlock);
> + spin_unlock(&err_nodepda->bte_if[i].spinlock);
I thought this went in months ago. I guess I need to check on
where this change currently is.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [patch] Altix BTE error handling fix
2005-01-10 23:33 [patch] Altix BTE error handling fix Russ Anderson
2005-01-11 12:03 ` Robin Holt
@ 2005-01-11 12:12 ` Robin Holt
2005-01-11 17:49 ` Jesse Barnes
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Robin Holt @ 2005-01-11 12:12 UTC (permalink / raw)
To: linux-ia64
>
> I thought this went in months ago. I guess I need to check on
> where this change currently is.
I/O Reorg patch blew this away. Let's talk about the ieclr thing
when you get I get into the office today and then resubmit. This
will be the third time we have submitted the spin_unlock() fix :(
Sorry for being rude in the earlier post.
Robin
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [patch] Altix BTE error handling fix
2005-01-10 23:33 [patch] Altix BTE error handling fix Russ Anderson
2005-01-11 12:03 ` Robin Holt
2005-01-11 12:12 ` Robin Holt
@ 2005-01-11 17:49 ` Jesse Barnes
2005-01-11 18:26 ` Robin Holt
2005-01-11 18:38 ` Russ Anderson
4 siblings, 0 replies; 6+ messages in thread
From: Jesse Barnes @ 2005-01-11 17:49 UTC (permalink / raw)
To: linux-ia64
On Tuesday, January 11, 2005 4:12 am, Robin Holt wrote:
> > I thought this went in months ago. I guess I need to check on
> > where this change currently is.
>
> I/O Reorg patch blew this away. Let's talk about the ieclr thing
> when you get I get into the office today and then resubmit. This
> will be the third time we have submitted the spin_unlock() fix :(
Ugg. That reorg really shouldn't have touched much beyond the I/O files, but
for some reason it hit like everything in the tree. I mistakenly assumed
that it was just lindenting everything, which would have been fine
(unfortunately, real changes mixed with lindent are pretty hard to pick up).
Sorry about that, I'll keep a closer eye on things.
Jesse
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [patch] Altix BTE error handling fix
2005-01-10 23:33 [patch] Altix BTE error handling fix Russ Anderson
` (2 preceding siblings ...)
2005-01-11 17:49 ` Jesse Barnes
@ 2005-01-11 18:26 ` Robin Holt
2005-01-11 18:38 ` Russ Anderson
4 siblings, 0 replies; 6+ messages in thread
From: Robin Holt @ 2005-01-11 18:26 UTC (permalink / raw)
To: linux-ia64
On Tue, Jan 11, 2005 at 09:49:02AM -0800, Jesse Barnes wrote:
> On Tuesday, January 11, 2005 4:12 am, Robin Holt wrote:
> > > I thought this went in months ago. I guess I need to check on
> > > where this change currently is.
> >
> > I/O Reorg patch blew this away. Let's talk about the ieclr thing
> > when you get I get into the office today and then resubmit. This
> > will be the third time we have submitted the spin_unlock() fix :(
>
> Ugg. That reorg really shouldn't have touched much beyond the I/O files, but
> for some reason it hit like everything in the tree. I mistakenly assumed
> that it was just lindenting everything, which would have been fine
> (unfortunately, real changes mixed with lindent are pretty hard to pick up).
> Sorry about that, I'll keep a closer eye on things.
It was a difficult change to detect this sort of thing on. The I/O
group was maintaining their patches out of sync with the kernel bk
tree. We checked changes into the bk tree. They removed that file
and moved it to a new directory. When they hit the conflict, they
just assumed their version was correct even though it was out of
date. Those things happen. My only regret is we do not have the
time to fully test the error handling on the bte code routinely.
We don't even have the time to put together an automated test.
Sort of stinks.
I have not had a chance to talk with Russ. I think his patch was
actually correct and I over reacted. I just want to verify one
thing with him first.
Thanks,
Robin
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [patch] Altix BTE error handling fix
2005-01-10 23:33 [patch] Altix BTE error handling fix Russ Anderson
` (3 preceding siblings ...)
2005-01-11 18:26 ` Robin Holt
@ 2005-01-11 18:38 ` Russ Anderson
4 siblings, 0 replies; 6+ messages in thread
From: Russ Anderson @ 2005-01-11 18:38 UTC (permalink / raw)
To: linux-ia64
Robin Holt wrote:
>
> I have not had a chance to talk with Russ. I think his patch was
> actually correct and I over reacted. I just want to verify one
> thing with him first.
Talked with Robin. This is the right patch. Life is good.
Signed-off-by: Russ Anderson <rja@sgi.com>
--------------------------------------------------------------
Index: 2.6.11/arch/ia64/sn/kernel/bte_error.c
=================================--- 2.6.11.orig/arch/ia64/sn/kernel/bte_error.c 2005-01-10 15:39:36.813394456 -0600
+++ 2.6.11/arch/ia64/sn/kernel/bte_error.c 2005-01-10 16:06:53.244516756 -0600
@@ -47,6 +47,7 @@
ii_icrb0_d_u_t icrbd; /* II CRB Register D */
ii_ibcr_u_t ibcr;
ii_icmr_u_t icmr;
+ ii_ieclr_u_t ieclr;
BTE_PRINTK(("bte_error_handler(%p) - %d\n", err_nodepda,
smp_processor_id()));
@@ -131,6 +132,14 @@
imem.ii_imem_fld_s.i_b0_esd = imem.ii_imem_fld_s.i_b1_esd = 1;
REMOTE_HUB_S(nasid, IIO_IMEM, imem.ii_imem_regval);
+ /* Clear BTE0/1 error bits */
+ ieclr.ii_ieclr_regval = 0;
+ if (err_nodepda->bte_if[0].bh_error != BTE_SUCCESS)
+ ieclr.ii_ieclr_fld_s.i_e_bte_0 = 1;
+ if (err_nodepda->bte_if[1].bh_error != BTE_SUCCESS)
+ ieclr.ii_ieclr_fld_s.i_e_bte_1 = 1;
+ REMOTE_HUB_S(nasid, IIO_IECLR, ieclr.ii_ieclr_regval);
+
/* Reinitialize both BTE state machines. */
ibcr.ii_ibcr_regval = REMOTE_HUB_L(nasid, IIO_IBCR);
ibcr.ii_ibcr_fld_s.i_soft_reset = 1;
@@ -152,7 +161,7 @@
err_nodepda->bte_if[i].cleanup_active = 0;
BTE_PRINTK(("eh:%p:%d Unlocked %d\n", err_nodepda,
smp_processor_id(), i));
- spin_unlock(&pda->cpu_bte_if[i]->spinlock);
+ spin_unlock(&err_nodepda->bte_if[i].spinlock);
}
del_timer(recovery_timer);
--
Russ Anderson, OS RAS/Partitioning Project Lead
SGI - Silicon Graphics Inc rja@sgi.com
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2005-01-11 18:38 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-01-10 23:33 [patch] Altix BTE error handling fix Russ Anderson
2005-01-11 12:03 ` Robin Holt
2005-01-11 12:12 ` Robin Holt
2005-01-11 17:49 ` Jesse Barnes
2005-01-11 18:26 ` Robin Holt
2005-01-11 18:38 ` Russ Anderson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox