* Issue with file transfers to a mass storage device on SMP system
@ 2010-07-27 6:34 Maulik
2010-07-27 7:05 ` Ming Lei
2010-07-27 9:38 ` Shilimkar, Santosh
0 siblings, 2 replies; 14+ messages in thread
From: Maulik @ 2010-07-27 6:34 UTC (permalink / raw)
To: linux-arm-kernel
Hello,
I am working on OMAP4 which has two ARM Cortex A9 microprocessors (SMP
system) with 2.6.35-rc5 kernel.
I have an issue whereby transferring a large file (>20MB) over USB (OMAP
Mentor USB controller acts as USB host) to an attached USB thumb drive
fails.
The USB analyzer trace shows that the 31 bytes CBW packet is corrupted and
device responds with a STALL when this issue occurs.
Further it was found that the CBWCB field (the last 16 bytes which contains
the command to be executed by the device) of the CBW packet was Zero in the
failure case. Also the first 15 bytes of the CBW packet contained valid
data.
The code snippet below from usb_stor_Bulk_transport () in
drivers/usb/storage/transport.c looks to be a problem area.
/* copy the command payload */
memset(bcb->CDB, 0, sizeof(bcb->CDB));
memcpy(bcb->CDB, srb->cmnd, bcb->Length);
Looks like when the issue occurs the memory (bcb->CDB) is not yet updated
due to likely out of order execution due to SMP.
I have tried below workarounds that fixes the issue.
(1)Enabling debug prints / adding delays fixes the issue.
(2)Using nosmp in bootargs i.e. disabling SMP fixes the issue.
(3)Using wmb() (Write memory barrier) before starting DMA transfer in MUSB
driver fixes the issue.
(4)Using wmb() for only 31 bytes CBW packets after the memcpy(in storage
stack (transport.c)) fixes the issue.
(5)Checking if last 16 bytes (of the 31 bytes CBW packet) are Zero or not
after the memcpy() also fixes the issue. i.e if you read back the memory,
the memory seems to be updated.
Has anyone seen such issue on a SMP system?
Please comment on the need and the usage of a memory barrier in this case.
Thanks,
Maulik
^ permalink raw reply [flat|nested] 14+ messages in thread
* Issue with file transfers to a mass storage device on SMP system
2010-07-27 6:34 Issue with file transfers to a mass storage device on SMP system Maulik
@ 2010-07-27 7:05 ` Ming Lei
2010-07-27 9:38 ` Shilimkar, Santosh
1 sibling, 0 replies; 14+ messages in thread
From: Ming Lei @ 2010-07-27 7:05 UTC (permalink / raw)
To: linux-arm-kernel
2010/7/27 Maulik <x0082077@ti.com>:
> Hello,
>
> I am working on OMAP4 which has two ARM Cortex A9 microprocessors (SMP
> system) with 2.6.35-rc5 kernel.
>
> I have an issue whereby transferring a large file (>20MB) over USB (OMAP
> Mentor USB controller acts as USB host) to an attached USB thumb drive
> fails.
>
> The USB analyzer trace shows that the 31 bytes CBW packet is corrupted and
> device responds with a STALL when this issue occurs.
>
> Further it was found that the CBWCB field (the last 16 bytes which contains
> the command to be executed by the device) of the CBW packet was Zero in the
> failure case. Also the first 15 bytes of the CBW packet contained valid
> data.
>
> The code snippet below from usb_stor_Bulk_transport () in
> drivers/usb/storage/transport.c looks to be a problem area.
>
> /* copy the command payload */
>
> ?memset(bcb->CDB, 0, sizeof(bcb->CDB));
> ?memcpy(bcb->CDB, srb->cmnd, bcb->Length);
>
> Looks like when the issue occurs the memory (bcb->CDB) is not yet updated
> due to likely out of order execution due to SMP.
>
> I have tried below workarounds that fixes the issue.
>
> (1)Enabling debug prints / adding delays fixes the issue.
> (2)Using nosmp in bootargs i.e. disabling SMP fixes the issue.
> (3)Using wmb() (Write memory barrier) before starting DMA transfer in MUSB
> driver fixes the issue.
Seems it is one solution.
It should be caused by dma_alloc_coherent, which allocated uncached but
buffered coherent buffer, see discussions below:
http://marc.info/?t=127918539100004&r=1&w=2
> (4)Using wmb() for only 31 bytes CBW packets after the memcpy(in storage
> stack (transport.c)) fixes the issue.
> (5)Checking if last 16 bytes (of the 31 bytes CBW packet) are Zero or not
> after the memcpy() also fixes the issue. i.e if you read back the memory,
> the memory seems to be updated.
>
> Has anyone seen such issue on a SMP system?
>
> Please comment on the need and the usage of a memory barrier in this case.
>
> Thanks,
> Maulik
>
--
Lei Ming
^ permalink raw reply [flat|nested] 14+ messages in thread
* Issue with file transfers to a mass storage device on SMP system
2010-07-27 6:34 Issue with file transfers to a mass storage device on SMP system Maulik
2010-07-27 7:05 ` Ming Lei
@ 2010-07-27 9:38 ` Shilimkar, Santosh
2010-07-27 10:01 ` Russell King - ARM Linux
1 sibling, 1 reply; 14+ messages in thread
From: Shilimkar, Santosh @ 2010-07-27 9:38 UTC (permalink / raw)
To: linux-arm-kernel
> -----Original Message-----
> From: Mankad, Maulik Ojas
> Sent: Tuesday, July 27, 2010 12:05 PM
> To: linux-usb at vger.kernel.org
> Cc: Shilimkar, Santosh; linux-arm-kernel at lists.infradead.org; linux-
> omap at vger.kernel.org
> Subject: Issue with file transfers to a mass storage device on SMP system
>
> Hello,
>
> I am working on OMAP4 which has two ARM Cortex A9 microprocessors (SMP
> system) with 2.6.35-rc5 kernel.
>
> I have an issue whereby transferring a large file (>20MB) over USB (OMAP
> Mentor USB controller acts as USB host) to an attached USB thumb drive
> fails.
>
> The USB analyzer trace shows that the 31 bytes CBW packet is corrupted and
> device responds with a STALL when this issue occurs.
>
> Further it was found that the CBWCB field (the last 16 bytes which
> contains
> the command to be executed by the device) of the CBW packet was Zero in
> the
> failure case. Also the first 15 bytes of the CBW packet contained valid
> data.
>
> The code snippet below from usb_stor_Bulk_transport () in
> drivers/usb/storage/transport.c looks to be a problem area.
>
> /* copy the command payload */
>
> memset(bcb->CDB, 0, sizeof(bcb->CDB));
> memcpy(bcb->CDB, srb->cmnd, bcb->Length);
>
> Looks like when the issue occurs the memory (bcb->CDB) is not yet updated
> due to likely out of order execution due to SMP.
>
As discussed, the main reason is the cache maintenance isn't done on
"bcb->CDB" buffers and hence the data remains in CPU write buffer instead of the physical memory on which DMA operates.
In working scenario's, there might be a barrier in the path which is helping
you.
Regards,
Santosh
^ permalink raw reply [flat|nested] 14+ messages in thread
* Issue with file transfers to a mass storage device on SMP system
2010-07-27 9:38 ` Shilimkar, Santosh
@ 2010-07-27 10:01 ` Russell King - ARM Linux
2010-07-27 10:19 ` Shilimkar, Santosh
0 siblings, 1 reply; 14+ messages in thread
From: Russell King - ARM Linux @ 2010-07-27 10:01 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, Jul 27, 2010 at 03:08:54PM +0530, Shilimkar, Santosh wrote:
> As discussed, the main reason is the cache maintenance isn't done on
> "bcb->CDB" buffers and hence the data remains in CPU write buffer
> instead of the physical memory on which DMA operates.
struct bulk_cb_wrap {
__le32 Signature; /* contains 'USBC' */
__u32 Tag; /* unique per command id */
__le32 DataTransferLength; /* size of data */
__u8 Flags; /* direction in bit 0 */
__u8 Lun; /* LUN normally 0 */
__u8 Length; /* of of the CDB */
__u8 CDB[16]; /* max command */
};
So, CDB is contained within bcb...bcb+sizeof(*bcb).
The bcb is passed to usb_stor_bulk_transfer_buf:
result = usb_stor_bulk_transfer_buf(us, us->send_bulk_pipe,
bcb, cbwlen, NULL);
which fills it into a URB:
usb_fill_bulk_urb(us->current_urb, us->pusb_dev, pipe, buf, length,
usb_stor_blocking_completion, NULL);
This sets the URB buffer pointers:
urb->transfer_buffer = transfer_buffer;
urb->transfer_buffer_length = buffer_length;
And this buffer should be dma-mapped and dma-unmapped as appropriate.
Wasn't there an issue with the DMA mapping being used with a PIO USB
host recently? Is that the problem here?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Issue with file transfers to a mass storage device on SMP system
2010-07-27 10:01 ` Russell King - ARM Linux
@ 2010-07-27 10:19 ` Shilimkar, Santosh
2010-07-27 10:41 ` Russell King - ARM Linux
0 siblings, 1 reply; 14+ messages in thread
From: Shilimkar, Santosh @ 2010-07-27 10:19 UTC (permalink / raw)
To: linux-arm-kernel
> -----Original Message-----
> From: Russell King - ARM Linux [mailto:linux at arm.linux.org.uk]
> Sent: Tuesday, July 27, 2010 3:31 PM
> To: Shilimkar, Santosh
> Cc: Mankad, Maulik Ojas; linux-usb at vger.kernel.org; linux-
> omap at vger.kernel.org; linux-arm-kernel at lists.infradead.org
> Subject: Re: Issue with file transfers to a mass storage device on SMP
> system
>
> On Tue, Jul 27, 2010 at 03:08:54PM +0530, Shilimkar, Santosh wrote:
> > As discussed, the main reason is the cache maintenance isn't done on
> > "bcb->CDB" buffers and hence the data remains in CPU write buffer
> > instead of the physical memory on which DMA operates.
>
> struct bulk_cb_wrap {
> __le32 Signature; /* contains 'USBC' */
> __u32 Tag; /* unique per command id */
> __le32 DataTransferLength; /* size of data */
> __u8 Flags; /* direction in bit 0 */
> __u8 Lun; /* LUN normally 0 */
> __u8 Length; /* of of the CDB */
> __u8 CDB[16]; /* max command */
> };
>
> So, CDB is contained within bcb...bcb+sizeof(*bcb).
>
> The bcb is passed to usb_stor_bulk_transfer_buf:
>
> result = usb_stor_bulk_transfer_buf(us, us->send_bulk_pipe,
> bcb, cbwlen, NULL);
>
> which fills it into a URB:
>
> usb_fill_bulk_urb(us->current_urb, us->pusb_dev, pipe, buf,
> length,
> usb_stor_blocking_completion, NULL);
>
> This sets the URB buffer pointers:
>
> urb->transfer_buffer = transfer_buffer;
> urb->transfer_buffer_length = buffer_length;
>
> And this buffer should be dma-mapped and dma-unmapped as appropriate.
>
> Wasn't there an issue with the DMA mapping being used with a PIO USB
> host recently? Is that the problem here?
That's correct.
The last issue was otherway round where we were doing map/unmpap on PIO buffer.
This issue is because "CDB[16]", is not cleaned up before merging it into CBW.
Regards,
Santosh
^ permalink raw reply [flat|nested] 14+ messages in thread
* Issue with file transfers to a mass storage device on SMP system
2010-07-27 10:19 ` Shilimkar, Santosh
@ 2010-07-27 10:41 ` Russell King - ARM Linux
2010-07-27 12:00 ` Shilimkar, Santosh
0 siblings, 1 reply; 14+ messages in thread
From: Russell King - ARM Linux @ 2010-07-27 10:41 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, Jul 27, 2010 at 03:49:29PM +0530, Shilimkar, Santosh wrote:
> > On Tue, Jul 27, 2010 at 03:08:54PM +0530, Shilimkar, Santosh wrote:
> > > As discussed, the main reason is the cache maintenance isn't done on
> > > "bcb->CDB" buffers and hence the data remains in CPU write buffer
> > > instead of the physical memory on which DMA operates.
> >
> > struct bulk_cb_wrap {
> > __le32 Signature; /* contains 'USBC' */
> > __u32 Tag; /* unique per command id */
> > __le32 DataTransferLength; /* size of data */
> > __u8 Flags; /* direction in bit 0 */
> > __u8 Lun; /* LUN normally 0 */
> > __u8 Length; /* of of the CDB */
> > __u8 CDB[16]; /* max command */
> > };
> >
> > So, CDB is contained within bcb...bcb+sizeof(*bcb).
> >
> > The bcb is passed to usb_stor_bulk_transfer_buf:
> >
> > result = usb_stor_bulk_transfer_buf(us, us->send_bulk_pipe,
> > bcb, cbwlen, NULL);
> >
> > which fills it into a URB:
> >
> > usb_fill_bulk_urb(us->current_urb, us->pusb_dev, pipe, buf,
> > length,
> > usb_stor_blocking_completion, NULL);
> >
> > This sets the URB buffer pointers:
> >
> > urb->transfer_buffer = transfer_buffer;
> > urb->transfer_buffer_length = buffer_length;
> >
> > And this buffer should be dma-mapped and dma-unmapped as appropriate.
> >
> > Wasn't there an issue with the DMA mapping being used with a PIO USB
> > host recently? Is that the problem here?
> That's correct.
> The last issue was otherway round where we were doing map/unmpap on PIO buffer.
That's exactly what I said.
> This issue is because "CDB[16]", is not cleaned up before merging it into CBW.
The CDB array is part of the CBW (struct bulk_cb_wrap). When the
struct bulk_cb_wrap is mapped for DMA, the CDB will also be as it
is contained entirely within the CBW structure.
As USB deals with the whole of the CBW as one block, it can't be that
half of it is being DMA-mapped and the other half isn't - something
else must be going on. If the CDB was a separate chunk of memory, I
could believe what you're suggesting.
What we _do_ know is that 2.6.35-rc5 misses Catalin's barrier patches
for readl+writel, which means that there are ordering issues between
writing to memory and writing to devices. This is what is going on
here, and it is confirmed by this:
| (3)Using wmb() (Write memory barrier) before starting DMA transfer in MUSB
| driver fixes the issue.
from Maulik. The fix is to have Catalin's ordered IO accessors which
add the wmb() internally to writel() to ensure that memory accesses
are visible.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Issue with file transfers to a mass storage device on SMP system
2010-07-27 10:41 ` Russell King - ARM Linux
@ 2010-07-27 12:00 ` Shilimkar, Santosh
2010-07-27 13:45 ` Shilimkar, Santosh
0 siblings, 1 reply; 14+ messages in thread
From: Shilimkar, Santosh @ 2010-07-27 12:00 UTC (permalink / raw)
To: linux-arm-kernel
> -----Original Message-----
> From: Russell King - ARM Linux [mailto:linux at arm.linux.org.uk]
> Sent: Tuesday, July 27, 2010 4:12 PM
> To: Shilimkar, Santosh
> Cc: Mankad, Maulik Ojas; linux-usb at vger.kernel.org; linux-
> omap at vger.kernel.org; linux-arm-kernel at lists.infradead.org
> Subject: Re: Issue with file transfers to a mass storage device on SMP
> system
>
> On Tue, Jul 27, 2010 at 03:49:29PM +0530, Shilimkar, Santosh wrote:
> > > On Tue, Jul 27, 2010 at 03:08:54PM +0530, Shilimkar, Santosh wrote:
> > > > As discussed, the main reason is the cache maintenance isn't done on
> > > > "bcb->CDB" buffers and hence the data remains in CPU write buffer
> > > > instead of the physical memory on which DMA operates.
> > >
> > > struct bulk_cb_wrap {
> > > __le32 Signature; /* contains 'USBC' */
> > > __u32 Tag; /* unique per command id */
> > > __le32 DataTransferLength; /* size of data */
> > > __u8 Flags; /* direction in bit 0 */
> > > __u8 Lun; /* LUN normally 0 */
> > > __u8 Length; /* of of the CDB */
> > > __u8 CDB[16]; /* max command */
> > > };
> > >
> > > So, CDB is contained within bcb...bcb+sizeof(*bcb).
> > >
> > > The bcb is passed to usb_stor_bulk_transfer_buf:
> > >
> > > result = usb_stor_bulk_transfer_buf(us, us->send_bulk_pipe,
> > > bcb, cbwlen, NULL);
> > >
> > > which fills it into a URB:
> > >
> > > usb_fill_bulk_urb(us->current_urb, us->pusb_dev, pipe, buf,
> > > length,
> > > usb_stor_blocking_completion, NULL);
> > >
> > > This sets the URB buffer pointers:
> > >
> > > urb->transfer_buffer = transfer_buffer;
> > > urb->transfer_buffer_length = buffer_length;
> > >
> > > And this buffer should be dma-mapped and dma-unmapped as appropriate.
> > >
> > > Wasn't there an issue with the DMA mapping being used with a PIO USB
> > > host recently? Is that the problem here?
> > That's correct.
> > The last issue was otherway round where we were doing map/unmpap on PIO
> buffer.
>
> That's exactly what I said.
>
> > This issue is because "CDB[16]", is not cleaned up before merging it
> into CBW.
>
> The CDB array is part of the CBW (struct bulk_cb_wrap). When the
> struct bulk_cb_wrap is mapped for DMA, the CDB will also be as it
> is contained entirely within the CBW structure.
>
> As USB deals with the whole of the CBW as one block, it can't be that
> half of it is being DMA-mapped and the other half isn't - something
> else must be going on. If the CDB was a separate chunk of memory, I
> could believe what you're suggesting.
>
> What we _do_ know is that 2.6.35-rc5 misses Catalin's barrier patches
> for readl+writel, which means that there are ordering issues between
> writing to memory and writing to devices. This is what is going on
> here, and it is confirmed by this:
>
> | (3)Using wmb() (Write memory barrier) before starting DMA transfer in
> MUSB
> | driver fixes the issue.
>
> from Maulik. The fix is to have Catalin's ordered IO accessors which
> add the wmb() internally to writel() to ensure that memory accesses
> are visible.
We have already those patches Russell and still see the issue. I think WMB
is helping because data is just 16 bytes which can reside in WB. Had it been a bigger buffer this wouldn't have worked.
Will get back to you with more data on this.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Issue with file transfers to a mass storage device on SMP system
2010-07-27 12:00 ` Shilimkar, Santosh
@ 2010-07-27 13:45 ` Shilimkar, Santosh
2010-07-27 13:59 ` Russell King - ARM Linux
0 siblings, 1 reply; 14+ messages in thread
From: Shilimkar, Santosh @ 2010-07-27 13:45 UTC (permalink / raw)
To: linux-arm-kernel
> -----Original Message-----
> From: linux-omap-owner at vger.kernel.org [mailto:linux-omap-
> owner at vger.kernel.org] On Behalf Of Shilimkar, Santosh
> Sent: Tuesday, July 27, 2010 5:31 PM
> To: Russell King - ARM Linux
> Cc: Mankad, Maulik Ojas; linux-usb at vger.kernel.org; linux-
> omap at vger.kernel.org; linux-arm-kernel at lists.infradead.org
> Subject: RE: Issue with file transfers to a mass storage device on SMP
> system
>
> > -----Original Message-----
> > From: Russell King - ARM Linux [mailto:linux at arm.linux.org.uk]
> > Sent: Tuesday, July 27, 2010 4:12 PM
> > To: Shilimkar, Santosh
> > Cc: Mankad, Maulik Ojas; linux-usb at vger.kernel.org; linux-
> > omap at vger.kernel.org; linux-arm-kernel at lists.infradead.org
> > Subject: Re: Issue with file transfers to a mass storage device on SMP
> > system
> >
> > On Tue, Jul 27, 2010 at 03:49:29PM +0530, Shilimkar, Santosh wrote:
> > > > On Tue, Jul 27, 2010 at 03:08:54PM +0530, Shilimkar, Santosh wrote:
> > > > > As discussed, the main reason is the cache maintenance isn't done
> on
> > > > > "bcb->CDB" buffers and hence the data remains in CPU write
> buffer
> > > > > instead of the physical memory on which DMA operates.
> > > >
> > > > struct bulk_cb_wrap {
> > > > __le32 Signature; /* contains 'USBC' */
> > > > __u32 Tag; /* unique per command id */
> > > > __le32 DataTransferLength; /* size of data */
> > > > __u8 Flags; /* direction in bit 0 */
> > > > __u8 Lun; /* LUN normally 0 */
> > > > __u8 Length; /* of of the CDB */
> > > > __u8 CDB[16]; /* max command */
> > > > };
> > > >
> > > > So, CDB is contained within bcb...bcb+sizeof(*bcb).
> > > >
> > > > The bcb is passed to usb_stor_bulk_transfer_buf:
> > > >
> > > > result = usb_stor_bulk_transfer_buf(us, us->send_bulk_pipe,
> > > > bcb, cbwlen, NULL);
> > > >
> > > > which fills it into a URB:
> > > >
> > > > usb_fill_bulk_urb(us->current_urb, us->pusb_dev, pipe, buf,
> > > > length,
> > > > usb_stor_blocking_completion, NULL);
> > > >
> > > > This sets the URB buffer pointers:
> > > >
> > > > urb->transfer_buffer = transfer_buffer;
> > > > urb->transfer_buffer_length = buffer_length;
> > > >
> > > > And this buffer should be dma-mapped and dma-unmapped as
> appropriate.
> > > >
> > > > Wasn't there an issue with the DMA mapping being used with a PIO USB
> > > > host recently? Is that the problem here?
> > > That's correct.
> > > The last issue was otherway round where we were doing map/unmpap on
> PIO
> > buffer.
> >
> > That's exactly what I said.
> >
> > > This issue is because "CDB[16]", is not cleaned up before merging it
> > into CBW.
> >
> > The CDB array is part of the CBW (struct bulk_cb_wrap). When the
> > struct bulk_cb_wrap is mapped for DMA, the CDB will also be as it
> > is contained entirely within the CBW structure.
> >
> > As USB deals with the whole of the CBW as one block, it can't be that
> > half of it is being DMA-mapped and the other half isn't - something
> > else must be going on. If the CDB was a separate chunk of memory, I
> > could believe what you're suggesting.
> >
> > What we _do_ know is that 2.6.35-rc5 misses Catalin's barrier patches
> > for readl+writel, which means that there are ordering issues between
> > writing to memory and writing to devices. This is what is going on
> > here, and it is confirmed by this:
> >
> > | (3)Using wmb() (Write memory barrier) before starting DMA transfer in
> > MUSB
> > | driver fixes the issue.
> >
> > from Maulik. The fix is to have Catalin's ordered IO accessors which
> > add the wmb() internally to writel() to ensure that memory accesses
> > are visible.
> We have already those patches Russell and still see the issue. I think WMB
> is helping because data is just 16 bytes which can reside in WB. Had it
> been a bigger buffer this wouldn't have worked.
>
> Will get back to you with more data on this.
Maulik and I looked at the code and below is what seems to be an issue.
The mass storage USB layer is not respecting the DMA/CPU buffer ownership.
"usb_stor_Bulk_transport" gets "us-iobuf" which is already mapped for DMA.
But inside this function the buffers is updated by the CPU which should not
be done as per rule. Same buffer passed down for DMA to transfer where the
data might remain in CPU cache/WB. The dma map is not done on this buffer because it's being done already before passing it to 'usb_stor_Bulk_transport'
Some USB expert can comment if this is indeed the case !!
Regards,
Santosh
^ permalink raw reply [flat|nested] 14+ messages in thread
* Issue with file transfers to a mass storage device on SMP system
2010-07-27 13:45 ` Shilimkar, Santosh
@ 2010-07-27 13:59 ` Russell King - ARM Linux
2010-07-27 14:14 ` Shilimkar, Santosh
0 siblings, 1 reply; 14+ messages in thread
From: Russell King - ARM Linux @ 2010-07-27 13:59 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, Jul 27, 2010 at 07:15:25PM +0530, Shilimkar, Santosh wrote:
> > -----Original Message-----
> > From: linux-omap-owner at vger.kernel.org [mailto:linux-omap-
> > owner at vger.kernel.org] On Behalf Of Shilimkar, Santosh
> > Sent: Tuesday, July 27, 2010 5:31 PM
> > To: Russell King - ARM Linux
> > Cc: Mankad, Maulik Ojas; linux-usb at vger.kernel.org; linux-
> > omap at vger.kernel.org; linux-arm-kernel at lists.infradead.org
> > Subject: RE: Issue with file transfers to a mass storage device on SMP
> > system
> >
> > We have already those patches Russell and still see the issue. I think WMB
> > is helping because data is just 16 bytes which can reside in WB. Had it
> > been a bigger buffer this wouldn't have worked.
> >
> > Will get back to you with more data on this.
>
> Maulik and I looked at the code and below is what seems to be an issue.
> The mass storage USB layer is not respecting the DMA/CPU buffer ownership.
>
> "usb_stor_Bulk_transport" gets "us-iobuf" which is already mapped for DMA.
> But inside this function the buffers is updated by the CPU which should not
> be done as per rule. Same buffer passed down for DMA to transfer where the
> data might remain in CPU cache/WB. The dma map is not done on this buffer
> because it's being done already before passing it to 'usb_stor_Bulk_transport'
Hang on.
us->iobuf is a DMA coherent buffer, allocated by usb_alloc_coherent(),
in turn hcd_buffer_alloc(). This comes from either a DMA pool, or
dma_alloc_coherent(). These buffers are simultaneously owned by both
the CPU and the DMA agent.
So to talk about the buffer "already being mapped" and "buffer ownership"
is wrong. It's supposed to be a DMA coherent buffer, and trying to use
dma_map_*() on it definitely won't work.
This memory will be mapped in as 'normal memory, uncached', and with
Catalins IO ordering patches, we rely upon wmb() ensuring that data
written to it becomes visible to the DMA agents. (wmb() by default
aliases to mb(), which is a dsb() followed by outer_sync().)
I think OMAP overrides the definitions of the mandatory barriers -
the question is whether the OMAP implementation is sufficient to ensure
that data written to a 'normal memory, uncached' mapping makes it out
RAM so that the DMA agents can see it.
As the OMAP mandatory barrier implementation isn't in mainline, I can't
comment on that. However, I feel certain that this is where the problem
is.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Issue with file transfers to a mass storage device on SMP system
2010-07-27 13:59 ` Russell King - ARM Linux
@ 2010-07-27 14:14 ` Shilimkar, Santosh
2010-07-27 14:21 ` Russell King - ARM Linux
0 siblings, 1 reply; 14+ messages in thread
From: Shilimkar, Santosh @ 2010-07-27 14:14 UTC (permalink / raw)
To: linux-arm-kernel
> -----Original Message-----
> From: Russell King - ARM Linux [mailto:linux at arm.linux.org.uk]
> Sent: Tuesday, July 27, 2010 7:30 PM
> To: Shilimkar, Santosh
> Cc: Mankad, Maulik Ojas; linux-usb at vger.kernel.org; linux-
> omap at vger.kernel.org; linux-arm-kernel at lists.infradead.org
> Subject: Re: Issue with file transfers to a mass storage device on SMP
> system
>
> On Tue, Jul 27, 2010 at 07:15:25PM +0530, Shilimkar, Santosh wrote:
> > > -----Original Message-----
> > > From: linux-omap-owner at vger.kernel.org [mailto:linux-omap-
> > > owner at vger.kernel.org] On Behalf Of Shilimkar, Santosh
> > > Sent: Tuesday, July 27, 2010 5:31 PM
> > > To: Russell King - ARM Linux
> > > Cc: Mankad, Maulik Ojas; linux-usb at vger.kernel.org; linux-
> > > omap at vger.kernel.org; linux-arm-kernel at lists.infradead.org
> > > Subject: RE: Issue with file transfers to a mass storage device on SMP
> > > system
> > >
> > > We have already those patches Russell and still see the issue. I think
> WMB
> > > is helping because data is just 16 bytes which can reside in WB. Had
> it
> > > been a bigger buffer this wouldn't have worked.
> > >
> > > Will get back to you with more data on this.
> >
> > Maulik and I looked at the code and below is what seems to be an issue.
> > The mass storage USB layer is not respecting the DMA/CPU buffer
> ownership.
> >
> > "usb_stor_Bulk_transport" gets "us-iobuf" which is already mapped for
> DMA.
> > But inside this function the buffers is updated by the CPU which should
> not
> > be done as per rule. Same buffer passed down for DMA to transfer where
> the
> > data might remain in CPU cache/WB. The dma map is not done on this
> buffer
> > because it's being done already before passing it to
> 'usb_stor_Bulk_transport'
>
> Hang on.
>
> us->iobuf is a DMA coherent buffer, allocated by usb_alloc_coherent(),
> in turn hcd_buffer_alloc(). This comes from either a DMA pool, or
> dma_alloc_coherent(). These buffers are simultaneously owned by both
> the CPU and the DMA agent.
>
Good point. We thought it's a kmalloc. Now it makes complete sense about the barrier effectiveness
> So to talk about the buffer "already being mapped" and "buffer ownership"
> is wrong. It's supposed to be a DMA coherent buffer, and trying to use
> dma_map_*() on it definitely won't work.
>
Fully agree
> This memory will be mapped in as 'normal memory, uncached', and with
> Catalins IO ordering patches, we rely upon wmb() ensuring that data
> written to it becomes visible to the DMA agents. (wmb() by default
> aliases to mb(), which is a dsb() followed by outer_sync().)
>
> I think OMAP overrides the definitions of the mandatory barriers -
> the question is whether the OMAP implementation is sufficient to ensure
> that data written to a 'normal memory, uncached' mapping makes it out
> RAM so that the DMA agents can see it.
>
OMAP doesn't override because the default definition is good enough now.
Shouldn't below work ?
#elif __LINUX_ARM_ARCH__ >= 7 || defined(CONFIG_SMP)
#define mb() do { dsb(); outer_sync(); } while (0)
#define rmb() dmb()
#define wmb() mb()
> As the OMAP mandatory barrier implementation isn't in mainline, I can't
> comment on that. However, I feel certain that this is where the problem
> is.
Do you think with above setting it should be still a problem ? I mean
with " CONFIG_ARCH_HAS_BARRIERS" not enabled
Regards,
Santosh
^ permalink raw reply [flat|nested] 14+ messages in thread
* Issue with file transfers to a mass storage device on SMP system
2010-07-27 14:14 ` Shilimkar, Santosh
@ 2010-07-27 14:21 ` Russell King - ARM Linux
2010-07-27 14:29 ` Shilimkar, Santosh
0 siblings, 1 reply; 14+ messages in thread
From: Russell King - ARM Linux @ 2010-07-27 14:21 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, Jul 27, 2010 at 07:44:20PM +0530, Shilimkar, Santosh wrote:
> OMAP doesn't override because the default definition is good enough now.
Ah, good to know.
> Shouldn't below work ?
> #elif __LINUX_ARM_ARCH__ >= 7 || defined(CONFIG_SMP)
> #define mb() do { dsb(); outer_sync(); } while (0)
> #define rmb() dmb()
> #define wmb() mb()
Yes, that should get it out of the CPU and caches, and onto the bus.
However, I need to check up exactly what a write to the L2x0 SYNC
register gives us...
> > As the OMAP mandatory barrier implementation isn't in mainline, I can't
> > comment on that. However, I feel certain that this is where the problem
> > is.
>
> Do you think with above setting it should be still a problem ? I mean
> with " CONFIG_ARCH_HAS_BARRIERS" not enabled
Well, the question is whether getting it out of the outer cache (and
performing an effective memory barrier to the outer cache) is sufficient
for the DMA agent to see the data.
Could the data be sitting somewhere in the interconnect between the
CPU pushing it out of the outer cache and the DMA agent trying to read
from memory?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Issue with file transfers to a mass storage device on SMP system
2010-07-27 14:21 ` Russell King - ARM Linux
@ 2010-07-27 14:29 ` Shilimkar, Santosh
2010-07-27 16:07 ` Russell King - ARM Linux
0 siblings, 1 reply; 14+ messages in thread
From: Shilimkar, Santosh @ 2010-07-27 14:29 UTC (permalink / raw)
To: linux-arm-kernel
> -----Original Message-----
> From: Russell King - ARM Linux [mailto:linux at arm.linux.org.uk]
> Sent: Tuesday, July 27, 2010 7:51 PM
> To: Shilimkar, Santosh
> Cc: Mankad, Maulik Ojas; linux-usb at vger.kernel.org; linux-
> omap at vger.kernel.org; linux-arm-kernel at lists.infradead.org
> Subject: Re: Issue with file transfers to a mass storage device on SMP
> system
>
> On Tue, Jul 27, 2010 at 07:44:20PM +0530, Shilimkar, Santosh wrote:
> > OMAP doesn't override because the default definition is good enough now.
>
> Ah, good to know.
>
> > Shouldn't below work ?
> > #elif __LINUX_ARM_ARCH__ >= 7 || defined(CONFIG_SMP)
> > #define mb() do { dsb(); outer_sync(); } while (0)
> > #define rmb() dmb()
> > #define wmb() mb()
>
> Yes, that should get it out of the CPU and caches, and onto the bus.
> However, I need to check up exactly what a write to the L2x0 SYNC
> register gives us...
>
> > > As the OMAP mandatory barrier implementation isn't in mainline, I
> can't
> > > comment on that. However, I feel certain that this is where the
> problem
> > > is.
> >
> > Do you think with above setting it should be still a problem ? I mean
> > with " CONFIG_ARCH_HAS_BARRIERS" not enabled
>
> Well, the question is whether getting it out of the outer cache (and
> performing an effective memory barrier to the outer cache) is sufficient
> for the DMA agent to see the data.
>
> Could the data be sitting somewhere in the interconnect between the
> CPU pushing it out of the outer cache and the DMA agent trying to read
> from memory?
Once it's pushed out of L2X WB, it will hit the memory. Just to give an additional data point, with CATC analyzer what we see that only those
16 bytes CDB are 0x0. This buffer was memset to 0x0 just before the
memcpy.
I am just wondering who will issue a barrier(wmb) on this buffer before DMA
start if we don't use dma-mapping APIs? May be for dma_alloc_coherent
buffers, we need to explicitly issue the barrier.
Is that the expectation? If yes then the work-around we used seems to the
fix.
Regards,
Santosh
^ permalink raw reply [flat|nested] 14+ messages in thread
* Issue with file transfers to a mass storage device on SMP system
2010-07-27 14:29 ` Shilimkar, Santosh
@ 2010-07-27 16:07 ` Russell King - ARM Linux
2010-07-28 5:15 ` Shilimkar, Santosh
0 siblings, 1 reply; 14+ messages in thread
From: Russell King - ARM Linux @ 2010-07-27 16:07 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, Jul 27, 2010 at 07:59:17PM +0530, Shilimkar, Santosh wrote:
> Once it's pushed out of L2X WB, it will hit the memory. Just to give an additional data point, with CATC analyzer what we see that only those
> 16 bytes CDB are 0x0. This buffer was memset to 0x0 just before the
> memcpy.
>
> I am just wondering who will issue a barrier(wmb) on this buffer before DMA
> start if we don't use dma-mapping APIs? May be for dma_alloc_coherent
> buffers, we need to explicitly issue the barrier.
wmb's don't take addresses - they're a global thing. All stores before
the wmb() take effect before stores after the wmb().
The wmb() is issued by Catalin's IO ordering patches:
+#define writeb(v,c) ({ wmb(); writeb_relaxed(v,c); })
+#define writew(v,c) ({ wmb(); writew_relaxed(v,c); })
+#define writel(v,c) ({ wmb(); writel_relaxed(v,c); })
So, the wmb() is issued to ensure that all stores to (eg) buffers
allocated by dma_alloc_coherent() hit memory prior to the store to
the device.
Now, if you're writing to registers using something other than write[bwl](),
you'll miss the wmb(), and therefore your DMA buffer won't be up to date.
And this _is_ the problem:
drivers/usb/musb/musb_io.h:static inline void musb_writew(void __iomem *addr, unsigned offset, u16 data)
drivers/usb/musb/musb_io.h: { __raw_writew(data, addr + offset); }
drivers/usb/musb/musb_io.h:static inline void musb_writel(void __iomem *addr, unsigned offset, u32 data)
drivers/usb/musb/musb_io.h: { __raw_writel(data, addr + offset); }
drivers/usb/musb/musb_io.h:static inline void musb_writeb(void __iomem *addr, unsigned offset, u8 data)
drivers/usb/musb/musb_io.h: __raw_writew(tmp, addr + (offset & ~1));
drivers/usb/musb/musb_io.h:static inline void musb_writeb(void __iomem *addr, unsigned offset, u8 data)
drivers/usb/musb/musb_io.h: { __raw_writeb(data, addr + offset); }
All IO performed by musb misses out on the barriers - so what you
need to do is either add wmb()s to these, or you need to ensure
that the driver has the various necessary memory barriers in place.
The latter solution will be more efficient.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Issue with file transfers to a mass storage device on SMP system
2010-07-27 16:07 ` Russell King - ARM Linux
@ 2010-07-28 5:15 ` Shilimkar, Santosh
0 siblings, 0 replies; 14+ messages in thread
From: Shilimkar, Santosh @ 2010-07-28 5:15 UTC (permalink / raw)
To: linux-arm-kernel
> -----Original Message-----
> From: Russell King - ARM Linux [mailto:linux at arm.linux.org.uk]
> Sent: Tuesday, July 27, 2010 9:37 PM
> To: Shilimkar, Santosh
> Cc: Mankad, Maulik Ojas; linux-usb at vger.kernel.org; linux-
> omap at vger.kernel.org; linux-arm-kernel at lists.infradead.org
> Subject: Re: Issue with file transfers to a mass storage device on SMP
> system
>
> On Tue, Jul 27, 2010 at 07:59:17PM +0530, Shilimkar, Santosh wrote:
> > Once it's pushed out of L2X WB, it will hit the memory. Just to give an
> additional data point, with CATC analyzer what we see that only those
> > 16 bytes CDB are 0x0. This buffer was memset to 0x0 just before the
> > memcpy.
> >
> > I am just wondering who will issue a barrier(wmb) on this buffer before
> DMA
> > start if we don't use dma-mapping APIs? May be for dma_alloc_coherent
> > buffers, we need to explicitly issue the barrier.
>
> wmb's don't take addresses - they're a global thing. All stores before
> the wmb() take effect before stores after the wmb().
>
> The wmb() is issued by Catalin's IO ordering patches:
>
> +#define writeb(v,c) ({ wmb(); writeb_relaxed(v,c); })
> +#define writew(v,c) ({ wmb(); writew_relaxed(v,c); })
> +#define writel(v,c) ({ wmb(); writel_relaxed(v,c); })
>
> So, the wmb() is issued to ensure that all stores to (eg) buffers
> allocated by dma_alloc_coherent() hit memory prior to the store to
> the device.
>
> Now, if you're writing to registers using something other than
> write[bwl](),
> you'll miss the wmb(), and therefore your DMA buffer won't be up to date.
>
Yep.
> And this _is_ the problem:
>
> drivers/usb/musb/musb_io.h:static inline void musb_writew(void __iomem
> *addr, unsigned offset, u16 data)
> drivers/usb/musb/musb_io.h: { __raw_writew(data, addr + offset); }
> drivers/usb/musb/musb_io.h:static inline void musb_writel(void __iomem
> *addr, unsigned offset, u32 data)
> drivers/usb/musb/musb_io.h: { __raw_writel(data, addr + offset); }
> drivers/usb/musb/musb_io.h:static inline void musb_writeb(void __iomem
> *addr, unsigned offset, u8 data)
> drivers/usb/musb/musb_io.h: __raw_writew(tmp, addr + (offset & ~1));
> drivers/usb/musb/musb_io.h:static inline void musb_writeb(void __iomem
> *addr, unsigned offset, u8 data)
> drivers/usb/musb/musb_io.h: { __raw_writeb(data, addr + offset); }
>
> All IO performed by musb misses out on the barriers - so what you
> need to do is either add wmb()s to these, or you need to ensure
> that the driver has the various necessary memory barriers in place.
> The latter solution will be more efficient.
I also agree that patching driver is less impacting and non-intrusive. We will go ahead with the original work-around which adds barrier in the driver.
Thanks for the good discussion on this.
Regards,
Santosh
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2010-07-28 5:15 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-07-27 6:34 Issue with file transfers to a mass storage device on SMP system Maulik
2010-07-27 7:05 ` Ming Lei
2010-07-27 9:38 ` Shilimkar, Santosh
2010-07-27 10:01 ` Russell King - ARM Linux
2010-07-27 10:19 ` Shilimkar, Santosh
2010-07-27 10:41 ` Russell King - ARM Linux
2010-07-27 12:00 ` Shilimkar, Santosh
2010-07-27 13:45 ` Shilimkar, Santosh
2010-07-27 13:59 ` Russell King - ARM Linux
2010-07-27 14:14 ` Shilimkar, Santosh
2010-07-27 14:21 ` Russell King - ARM Linux
2010-07-27 14:29 ` Shilimkar, Santosh
2010-07-27 16:07 ` Russell King - ARM Linux
2010-07-28 5:15 ` Shilimkar, Santosh
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox