From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Nelson Date: Tue, 06 Mar 2012 14:28:34 -0700 Subject: [U-Boot] [PATCH V2] net: fec_mxc: allow use with cache enabled In-Reply-To: <201203062222.54929.marex@denx.de> References: <201203062045.58890.marex@denx.de> <4F567F57.1030901@boundarydevices.com> <201203062222.54929.marex@denx.de> Message-ID: <4F568182.4060905@boundarydevices.com> List-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: u-boot@lists.denx.de On 03/06/2012 02:22 PM, Marek Vasut wrote: > Dear Eric Nelson, > >> On 03/06/2012 12:45 PM, Marek Vasut wrote: >>> Dear Eric Nelson, >>> >>>> On 03/05/2012 01:06 PM, Marek Vasut wrote: >>>>> Dear Eric Nelson, >>>>> >>>>>> ensure that transmit and receive buffers are cache-line aligned >>>>>> >>>>>> invalidate cache after each packet received >>>>>> flush cache before transmitting >>>>>> >>>>>> Original patch by Marek: >>>>>> http://lists.denx.de/pipermail/u-boot/2012-February/117695.html >>>>> >>>>> Would be cool to Cc me :-p >>>> >>>> Sorry about that. >>> >>> It's ok, don't worry about it ;-) >>> >>> [...] >>> >>>>> I think this "writel()" call is bogus and should be removed in >>>>> subsequent patch and replaced with simple assignment. It was here >>>>> probably due to cache issues on PPC? >>>> >>>> The RBD has me puzzled. Do we treat them like registers and use >>>> readx/writex or like in-RAM data structures... >>> >>> I'd go for the in-RAM data structures, but such conversion should happen >>> in a separate patch only AFTER the cache support is in. >>> >>> [...] >> >> Sounds good. >> >>>>>> + if (!fec->rbd_base) { >>>>>> + ret = -ENOMEM; >>>>>> + goto err2; >>>>>> + } >>>>>> + memset(fec->rbd_base, 0, size); >>>>>> + } >>>>> >>>>> We might want to flush the descriptors to memory after they have been >>>>> inited? >>>> >>>> Again, good catch. >>>> >>>> On this topic (initialization of RBD), I had a bit of a concern >>>> regarding the initialization of things. >>>> >>>> In fec_open, the receive buffer descriptors are initialized and the >>>> last one set is to 'wrap'. If this code were to execute when the >>>> controller is live, bad things would surely happen. >>>> >>>> I traced through all of the paths I can see, and I believe that >>>> we're safe. It appears that fec_halt() will be called prior to >>>> any call to fec_init() and fec_open(). >>> >>> Yes, this will only happen if something went wrong. >>> >>>> In fec_open() a number of calls to fec_rbd_clean() are made and >>>> a single flush_dcache() is made afterwards. >>>> >>>> While this works and causes less thrashing than per-RBD flushes, >>>> I think the code would be simpler if fec_rbd_clean just did the >>>> flush itself. This would also remove the need for a separate >>>> flush in fec_recv. >>> >>> Not really, rbd might be (and likely is) smaller than cache line, >>> therefore you won't be able to flush single rbd, unless it's cacheline >>> aligned, which wastes space. >>> >>> [...] >> >> Yeah. Please disregard my comments. I wrote that before I fully >> appreciated what was being done in fec_recv(). >> >>>>>> + invalidate_dcache_range(addr, addr + size); >>>>>> + >>>>> >>>>> The idea here is the following (demo uses 32byte long cachelines): >>>>> >>>>> [DESC1][DESC2][DESC3][DESC4][DESC5][DESC6] >>>>> `------- cacheline --------' >>>>> >>>>> We want to start retrieving contents of DESC3, therefore "addr" points >>>>> to DESC1, "size" is size of cacheline (I hope there's no hardware with >>>>> 8 byte cachelines, but this should be ok here). >>>>> >>>>> NOTE[5]: Here we can invalidate the whole cacheline, because the >>>>> descriptors so far were modified only be hardware, not by us. We are >>>>> not writing anything there so we won't loose any information. >>>>> >>>>> NOTE[6]: This invalidation ensures that we always have a fresh copy of >>>>> the cacheline containing all the descriptors, therefore we always have >>>>> a fresh status of the descriptors we are about to pick. Since this is >>>>> a sequential execution, the cache eviction should not kick in here (or >>>>> might it?). >>>> >>>> Another way to look at this is this: >>>> After fec_open(), the hardware owns the rbd, and all we should do >>>> is read it. In order to make sure we don't have a stale copy, we >>>> need to call invalidate() before looking at the values. >>>> >>>> Tracing the code to find out whether this is true, the only write I see >>>> is within fec_recv() when the last descriptor is full, when the driver >>>> takes ownership of **all** of the descriptors, calling fec_rbd_clean() >>>> on each. >>>> >>>> The only thing that looks funky is this: >>>> size = (CONFIG_FEC_ALIGN / sizeof(struct fec_bd)) - 1; >>>> if ((fec->rbd_index& size) == size) { >>>> >>>> Wouldn't a test of rbd_index against FEC_RBD_NUM be more appropriate? >>>> i.e. >>>> >>>> if (fec->rbd_index == FEC_RBD_NUM-1) { >>> >>> I believe the FEC doesn't always start from rbd_index == 0, and if you >>> were to receive more than 64 rbds between open() and close(), this >>> implementation works, your would fail. >> >> Yep. Disregard that too. >> >> >> >>>>> The solution is the following: >>>>> >>>>> 1) Compute how many descriptors are per-cache line >>>>> 2) Make sure FEC_RBD_NUM * sizeof(struct fec_bd) is at least 2 * >>>>> CONFIG_FEC_DATA_ALIGNMENT in size, see NOTE[11]. >>>>> 3) Once the last RX buffer in the cacheline is processed, mark them all >>>>> clean and flush them all, see NOTE[10]. >>>>> >>>>> NOTE[10]: This is legal, because the hardware won't use RX descriptors >>>>> that it marked dirty (which means not picked up by software yet). We >>>>> clean the desciptors in an order the hardware would pick them up again >>>>> so there's no problem with race condition either. The only possible >>>>> issue here is if there was hardware with cacheline size smaller than >>>>> descriptor size (we should add a check for this at the begining of the >>>>> file). >>>>> >>>>> NOTE[11]: This is because we want the FEC to overwrite descriptors >>>>> below the other cacheline while we're marking the one containing >>>>> retrieved descriptors clean. >>>> >>>> Ahah! Now I see what the size calculation is doing. >>>> >>>> A well-named constant, maybe "RXDESC_PER_CACHELINE" would be useful >>>> here. >>> >>> Yes >>> >>>> #define RXDESC_PER_CACHELINE (CONFIG_FEC_ALIGN/sizeof(struct fec_bd)) >>>> >>>>>> - fec_rbd_clean(fec->rbd_index == (FEC_RBD_NUM - 1) ? 1 : > 0, rbd); >>>>>> + size = (CONFIG_FEC_DATA_ALIGNMENT / sizeof(struct > fec_bd)) - 1; >>>>>> >>>> size = RXDESC_PER_CACHELINE-1; >>>> >>>>>> + if ((fec->rbd_index& size) == size) { >>>> >>>> The line above only works if RXDESC_PER_CACHELINE is a multiple of 2, >>>> which is likely to work because sizeof(struct fec_bd) == 8. >>> >>> Adding such a comment (and maybe CPP check) won't hurt. >> >> I'm struggling getting the CPP to do the work at the moment... >> >>>>>> + i = fec->rbd_index - size; >>>>>> + addr = (uint32_t)&fec->rbd_base[i]; >>>>>> + for (; i<= fec->rbd_index + size; i++) { >>>> >>>> This flushes too many descriptors! This should be: >>>> for (; i<= fec->rbd_index; i++) { >>> >>> Agreed >> >> V3 patch forthcoming. >> >>>>> Uh, this was tough. >>>> >>>> How bad do we want cache? >>> >>> We're almost there, why do you ask? :-) >> >> I was just bein' snarky... >> >> I'm making a couple of other small changes in V3: >> >> - change fec_rbd_clean to only have a single >> call to write() and make it clearer that there's >> only one additional flag iff 'last' >> - use comparison to supply 'last' parameter in >> the call to fec_rbd_clean >> >> I think each of these makes the intent clearer. >> >> diff --git a/drivers/net/fec_mxc.c b/drivers/net/fec_mxc.c >> index d5d0d5e..f8691d4 100644 >> --- a/drivers/net/fec_mxc.c >> +++ b/drivers/net/fec_mxc.c >> @@ -355,16 +355,10 @@ static void fec_tbd_init(struct fec_priv *fec) >> */ >> static void fec_rbd_clean(int last, struct fec_bd *pRbd) >> { >> - /* >> - * Reset buffer descriptor as empty >> - */ >> + unsigned short flags = FEC_RBD_EMPTY; >> if (last) >> - writew(FEC_RBD_WRAP | FEC_RBD_EMPTY,&pRbd->status); >> - else >> - writew(FEC_RBD_EMPTY,&pRbd->status); >> - /* >> - * no data in it >> - */ >> + flags |= FEC_RBD_WRAP; >> + writew(flags,&pRbd->status); >> writew(0,&pRbd->data_length); >> } >> >> @@ -880,10 +874,8 @@ static int fec_recv(struct eth_device *dev) >> i = fec->rbd_index - size; >> addr = (uint32_t)&fec->rbd_base[i]; >> for (; i<= fec->rbd_index ; i++) { >> - if (i == (FEC_RBD_NUM - 1)) >> - fec_rbd_clean(1, >> &fec->rbd_base[i]); - else >> - fec_rbd_clean(0, >> &fec->rbd_base[i]); + fec_rbd_clean(i == >> (FEC_RBD_NUM - 1), + >> &fec->rbd_base[i]); >> } >> flush_dcache_range(addr, >> addr + CONFIG_FEC_ALIGN); > > Looking forward to V3! > Here's an early peek. I have to jump on a CC so I haven't had time to generate a proper one. Also haven't straightened out FEC_ALIGN and RXDESC_PER_CACHELINE macros. -------------- next part -------------- A non-text attachment was scrubbed... Name: fec-mxc-cache-v3.patch Type: text/x-patch Size: 13648 bytes Desc: not available URL: