public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Ondrej Zary <linux@rainbow-software.org>
To: Finn Thain <fthain@telegraphics.com.au>
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	linux-scsi@vger.kernel.org, linux-kernel@vger.kernel.org,
	Michael Schmitz <schmitzmic@gmail.com>
Subject: Re: [PATCH v6 0/6] g_NCR5380: PDMA fixes and cleanup
Date: Sun, 2 Jul 2017 16:51:36 +0200	[thread overview]
Message-ID: <201707021651.37016.linux@rainbow-software.org> (raw)
In-Reply-To: <alpine.LNX.2.00.1707021008110.2389@nippy.intranet>

On Sunday 02 July 2017 05:11:27 Finn Thain wrote:
> On Sat, 1 Jul 2017, Ondrej Zary wrote:
> > The write corruption is still present - "start" must be rolled back in
> > both IRQ and timeout cases.
>
> Your original algorithm aborts the transfer for a timeout. Same with mine.

I do "start -= 2 * 128" even after timeout.

> The bug must be a elsewhere.
>
> > And 128 B is not enough , 256 is OK (why did it work last time?).
>
> When I get contradictory results it usually means I booted the wrong build
> or built the wrong branch.

I've just retested PATCHv5, it really misses 128 bytes and works if I
add "residual += 128;".

> Actually, I think that adding 128 to the residual is correct in some
> sitations, and 256 is correct in other situations.
>
> > We just wrote a buffer to the chip but the chip is writing the previous
> > one to the drive - so if a problem arises, both buffers are lost.
>
> I see. I guess we have to take buffer swaps into account.
>
> > This fixes the corruption (although the "start > 0" check seems wrong
> > now): --- a/drivers/scsi/g_NCR5380.c
> > +++ b/drivers/scsi/g_NCR5380.c
> > @@ -598,23 +598,17 @@ static inline int generic_NCR5380_psend(struct
> > NCR5380_hostdata *hostdata, CSR_HOST_BUF_NOT_RDY, 0,
> >  		                           hostdata->c400_ctl_status,
> >  		                           CSR_GATED_53C80_IRQ,
> > -		                           CSR_GATED_53C80_IRQ, HZ / 64) < 0)
> > -                       break;
> > -
> > -		if (NCR5380_read(hostdata->c400_ctl_status) &
> > -		    CSR_HOST_BUF_NOT_RDY) {
> > +		                           CSR_GATED_53C80_IRQ, HZ / 64) < 0 ||
> > +		    (NCR5380_read(hostdata->c400_ctl_status) &
> > +		     (CSR_HOST_BUF_NOT_RDY | CSR_GATED_53C80_IRQ))) {
>
> You could add a printk to the timeout branch. If it executes, something is
> seriously wrong. E.g.
>
> -	break;
> +	{ pr_err("send timeout %02x, %d/%d\n",
> NCR5380_read(hostdata->c400_ctl_status), start, len); break; }

Yes, timeouts do happen:
[ 9671.909223] send timeout 14, 3840/4096
[ 9672.978079] send timeout 14, 2816/4096
[ 9675.323751] send timeout 14, 1280/4096

> >  			/* The chip has done a 128 B buffer swap but the first
> >  			 * buffer still has not reached the SCSI bus.
> >  			 */
> >  			if (start > 0)
> > -				start -= 128;
> > +				start -= 256;
> >  			break;
> >  		}
>
> BTW, that change carries the risk of 'start' going negative and the
> residual exceeding the length of the original transfer.
>
> But I agree with you that there's a problem with the residual.
>
> If I understand correctly, the 53c400 can't do a buffer swap until the
> disk acknowledges each of the 128 bytes from the buffer. But I guess the
> first buffer is special because the disk will not see the first byte of
> the transfer until after the first buffer swap.
>
> And it appears that the last buffer is also special: we have to wait for
> CSR_HOST_BUF_NOT_RDY even after start == len otherwise we may not detect a
> failure and fix the residual. So I think the datasheet is right; we have
> to iterate until the block counter goes to zero.
>
> I think it is safe to say that when CSR_HOST_BUF_NOT_RDY, 'start' is
> between 128 and 256 B ahead of the disk. Otherwise, the host buffer is
> empty and 'start' is no more than 128 B ahead of the disk.
>
> > -		if (NCR5380_read(hostdata->c400_ctl_status) &
> > -		    CSR_GATED_53C80_IRQ)
> > -			break;
> > -
> >  		if (hostdata->io_port && hostdata->io_width == 2)
> >  			outsw(hostdata->io_port + hostdata->c400_host_buf,
> >  			      src + start, 64);
> >
> >
> > DTC seems to work too.
>
> OK. Thanks for testing. Please try the patch below on top of v6.

It misses 256B blocks. It's caused by the timeouts, this patch fixes it:

--- a/drivers/scsi/g_NCR5380.c
+++ b/drivers/scsi/g_NCR5380.c
@@ -598,11 +598,9 @@ static inline int generic_NCR5380_psend(struct NCR5380_hostdata *hostdata,
 		                           CSR_HOST_BUF_NOT_RDY, 0,
 		                           hostdata->c400_ctl_status,
 		                           CSR_GATED_53C80_IRQ,
-		                           CSR_GATED_53C80_IRQ, HZ / 64) < 0)
-			break;
-
-		if (NCR5380_read(hostdata->c400_ctl_status) &
-		    CSR_HOST_BUF_NOT_RDY) {
+		                           CSR_GATED_53C80_IRQ, HZ / 64) < 0 ||
+		    (NCR5380_read(hostdata->c400_ctl_status) &
+		     CSR_HOST_BUF_NOT_RDY)) {
 			/* Both 128 B buffers are in use */
 			if (start >= 128)
 				start -= 128;


-- 
Ondrej Zary

  reply	other threads:[~2017-07-02 14:51 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-01  2:40 [PATCH v6 0/6] g_NCR5380: PDMA fixes and cleanup Finn Thain
2017-07-01  2:40 ` [PATCH v6 4/6] g_NCR5380: Use unambiguous terminology for PDMA send and receive Finn Thain
2017-07-01  2:40 ` [PATCH v6 1/6] g_NCR5380: Fix PDMA transfer size Finn Thain
2017-07-01  2:40 ` [PATCH v6 2/6] g_NCR5380: End PDMA transfer correctly on target disconnection Finn Thain
2017-07-01  2:40 ` [PATCH v6 5/6] g_NCR5380: Re-work PDMA loops Finn Thain
2017-07-01  2:40 ` [PATCH v6 6/6] g_NCR5380: Various DTC436 workarounds Finn Thain
2017-07-01  2:40 ` [PATCH v6 3/6] g_NCR5380: Cleanup comments and whitespace Finn Thain
2017-07-01 21:49 ` [PATCH v6 0/6] g_NCR5380: PDMA fixes and cleanup Ondrej Zary
2017-07-02  3:11   ` Finn Thain
2017-07-02 14:51     ` Ondrej Zary [this message]
2017-07-03  8:01       ` Finn Thain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201707021651.37016.linux@rainbow-software.org \
    --to=linux@rainbow-software.org \
    --cc=fthain@telegraphics.com.au \
    --cc=jejb@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=schmitzmic@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox