[RFC] Buggy e100.c on ARM / DMA sync interfaces

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [RFC] Buggy e100.c on ARM / DMA sync interfaces
@ 2004-06-30  9:49 Russell King
  2004-06-30 13:16 ` James Bottomley
  0 siblings, 1 reply; 2+ messages in thread
From: Russell King @ 2004-06-30  9:49 UTC (permalink / raw)
  To: Linux Kernel List, Jeff Garzik, James Bottomley

I have a report which indicates that the e100.c driver does not work on
ARM when caches are in writeback mode.  The symptoms indicate that the
card is unable to communicate properly.

Looking up port of RPC 100003/2 on 10.2.40.5
portmap: server 10.2.40.5 not responding, timed out
Root-NFS: Unable to get nfsd port number from server, using default
Looking up port of RPC 100005/1 on 10.2.40.5
portmap: server 10.2.40.5 not responding, timed out

Looking over the e100.c code, I find the following in e100_rx_alloc_skb():

        rx->skb->dev = nic->netdev;
        skb_reserve(rx->skb, rx_offset);
        memcpy(rx->skb->data, &nic->blank_rfd, sizeof(struct rfd));
        rx->dma_addr = pci_map_single(nic->pdev, rx->skb->data,
                RFD_BUF_LEN, PCI_DMA_BIDIRECTIONAL);

        /* Link the RFD to end of RFA by linking previous RFD to
         * this one, and clearing EL bit of previous.  */
        if(rx->prev->skb) {
                struct rfd *prev_rfd = (struct rfd *)rx->prev->skb->data;
                put_unaligned(cpu_to_le32(rx->dma_addr),
                        (u32 *)&prev_rfd->link);
                wmb();
                prev_rfd->command &= ~cpu_to_le16(cb_el);
                pci_dma_sync_single_for_device(nic->pdev, rx->prev->dma_addr,
                        sizeof(struct rfd), PCI_DMA_TODEVICE);
        }

This can't work - and I suspect anyone using the *dma_sync* functions
will be in for the same problem.  Why?

Cache lines.  They have a defined size and are not merely a single byte
or a single word.  If you modify even one single bit, you stand the
chance of writing back the whole cache line, possibly overwriting data
which the device has updated since the cache line was read.

Therefore, if you're going to use the dma_sync functions to modify data
owned by the remote device, you _must_ stop the remote device accessing
the surrounding data _before_ touching it.

With the above code on ARM, it effectively means that we will read the
whole struct rfd and some other data into cache, modify the command
field, and then write _at least_ the whole struct rfd back out, all
with the chip's DMA still running.

_That_ can't be good.

Note - I'm not saying that this is the cause of the above problem, but
that this is something I have spotted while reading through the driver
to ascertain why it possibly could not be working.

Comments?

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 PCMCIA      - http://pcmcia.arm.linux.org.uk/
                 2.6 Serial core

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [RFC] Buggy e100.c on ARM / DMA sync interfaces
  2004-06-30  9:49 [RFC] Buggy e100.c on ARM / DMA sync interfaces Russell King
@ 2004-06-30 13:16 ` James Bottomley
  0 siblings, 0 replies; 2+ messages in thread
From: James Bottomley @ 2004-06-30 13:16 UTC (permalink / raw)
  To: Russell King; +Cc: Linux Kernel List, Jeff Garzik

> This can't work - and I suspect anyone using the *dma_sync* functions
> will be in for the same problem.  Why?
> 
> Cache lines.  They have a defined size and are not merely a single byte
> or a single word.  If you modify even one single bit, you stand the
> chance of writing back the whole cache line, possibly overwriting data
> which the device has updated since the cache line was read.

Hang on, the PCI (and DMA) API explicitly states that to call sync on a
dma mapped area, you must do it for the *whole* of the area.  The API's
also transfer ownership of the area (i.e. only the device *or* the
driver may touch it depending on who has the ownership). This is
designedly to get us out of cache line interference problems.

So the particular problem here is an incorrect *abuse* of the API. 
There is a section in the black magic part of the DMA API that allows
you partially to sync a mapped area (dma_sync_single_range).  However,
it's in the experts only section and it explains that if you do this,
you're responsible for preventing cache line interference and provides
another API (dma_get_cache_alignment) explicitly so the driver knows
what the cache line boundaries are.

> Therefore, if you're going to use the dma_sync functions to modify data
> owned by the remote device, you _must_ stop the remote device accessing
> the surrounding data _before_ touching it.

Precisely, hence the ownership concept.

> With the above code on ARM, it effectively means that we will read the
> whole struct rfd and some other data into cache, modify the command
> field, and then write _at least_ the whole struct rfd back out, all
> with the chip's DMA still running.
> 
> _That_ can't be good.
> 
> Note - I'm not saying that this is the cause of the above problem, but
> that this is something I have spotted while reading through the driver
> to ascertain why it possibly could not be working.
> 
> Comments?

Clearly the e100 needs fixing.  Either by migrating it to the expert
API. Or, in this case, could we not just pad the structure with
appropriate L1_CACHE_ALIGNMENT tags?

James

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2004-06-30 13:17 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-06-30  9:49 [RFC] Buggy e100.c on ARM / DMA sync interfaces Russell King
2004-06-30 13:16 ` James Bottomley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox