LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: RFC: issues concerning the next NAPI interface
From: Bodo Eggert @ 2007-08-24 19:04 UTC (permalink / raw)
  To: Linas Vepstas, Jan-Bernd Themann, netdev, Thomas Klein,
	Jan-Bernd Themann, linux-kernel, linux-ppc, Christoph Raisch,
	Marcus Eder, Stefan Roscher
In-Reply-To: <8VKwj-8ke-27@gated-at.bofh.it>

Linas Vepstas <linas@austin.ibm.com> wrote:
> On Fri, Aug 24, 2007 at 03:59:16PM +0200, Jan-Bernd Themann wrote:

>> 3) On modern systems the incoming packets are processed very fast. Especially
>> on SMP systems when we use multiple queues we process only a few packets
>> per napi poll cycle. So NAPI does not work very well here and the interrupt
>> rate is still high.
> 
> I saw this too, on a system that is "modern" but not terribly fast, and
> only slightly (2-way) smp. (the spidernet)
> 
> I experimented wih various solutions, none were terribly exciting.  The
> thing that killed all of them was a crazy test case that someone sprung on
> me:  They had written a worst-case network ping-pong app: send one
> packet, wait for reply, send one packet, etc.
> 
> If I waited (indefinitely) for a second packet to show up, the test case
> completely stalled (since no second packet would ever arrive).  And if I
> introduced a timer to wait for a second packet, then I just increased
> the latency in the response to the first packet, and this was noticed,
> and folks complained.

Possible solution / possible brainfart:

Introduce a timer, but don't start to use it to combine packets unless you
receive n packets within the timeframe. If you receive less than m packets
within one timeframe, stop using the timer. The system should now have a
decent response time when the network is idle, and when the network is
busy, nobody will complain about the latency.-)
-- 
Funny quotes:
22. When everything's going your way, you're in the wrong lane and and going
    the wrong way.
Friß, Spammer: rsRxhvmk@CaR.7eggert.dyndns.org m@z3T.7eggert.dyndns.org

^ permalink raw reply

* Re: [PATCH 2/6] PowerPC 440EPx: Sequoia DTS
From: Sergei Shtylyov @ 2007-08-24 19:10 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: linuxppc-dev, David Gibson
In-Reply-To: <b055a84cbb23ef7287c1d585c1810a74@kernel.crashing.org>

Segher Boessenkool wrote:

>>>address-permutation = <0 1 3 2 4 5 7 6 e f d c a b 9 8>;

>>Yes, I was contemplating something like that.

> Let's not define this until we need it though :-)

    Let's ot even think of it, since this will end up in a "catch all" driver, 
and yet this may be not enough when the flash doesn't support 8-but R/W, for 
example (I've already quoted it...

>>>I haven't heard or thought of anything better either.  Using "ranges"
>>>is conceptually wrong, even ignoring the technical problems that come
>>>with it.
>>Why is "ranges" conceptually wrong?

> The flash partitions aren't separate devices sitting on a

    Yeah, that's why I decided not to go that from the very start... though 
wait: I didn't do this simply because they'renot devices.
That lead me to interesting question: do device tree have something for the 
disk partitions?

> "flash bus", they are "sub-devices" of their parent.

    They're quite an abstaction of a device -- althogh Linux treats them as 
separate devices indeed.

>>To be honest this looks rather to me like another case where having
>>overlapping 'reg' and 'ranges' would actually make sense.

> It never makes sense.  You should give the "master" device
> the full "reg" range it covers, and have it define its own
> address space; "sub-devices" can carve out their little hunk
> from that.  You don't want more than one device owning the
> same address range in the same address space.

    So, no "ranges" prop in MTD node is necessary? Phew... :-)

> Segher

WBR, Sergei

^ permalink raw reply

* Re: [PATCH v2] [02/10] pasemi_mac: Stop using the pci config space accessors for register read/writes
From: Olof Johansson @ 2007-08-24 18:11 UTC (permalink / raw)
  To: Stephen Rothwell; +Cc: netdev, jgarzik, linuxppc-dev
In-Reply-To: <20070824140531.ff7d66bf.sfr@canb.auug.org.au>

On Fri, Aug 24, 2007 at 02:05:31PM +1000, Stephen Rothwell wrote:
> On Thu, 23 Aug 2007 13:13:10 -0500 Olof Johansson <olof@lixom.net> wrote:
> >
> >  out:
> > -	pci_dev_put(mac->iob_pdev);
> > -out_put_dma_pdev:
> > -	pci_dev_put(mac->dma_pdev);
> > -out_free_netdev:
> > +	if (mac->iob_pdev)
> > +		pci_dev_put(mac->iob_pdev);
> > +	if (mac->dma_pdev)
> > +		pci_dev_put(mac->dma_pdev);
> 
> It is not documented as such (as far as I can see), but pci_dev_put is
> safe to call with NULL. And there are other places in the kernel that
> explicitly use that fact.

Some places check, others do not. I'll leave it be for now but might take
care of it during some future cleanup. Thanks for point it out though.


-Olof

^ permalink raw reply

* Re: 8555CDS BSP on 8548CDS board
From: Andy Fleming @ 2007-08-24 18:38 UTC (permalink / raw)
  To: mike zheng; +Cc: linuxppc-embedded
In-Reply-To: <5c9cd53b0708240900y6537e42cn5ee59b2a2a707768@mail.gmail.com>

On Aug 24, 2007, at 11:00, mike zheng wrote:

> Hi,
>
> I was told Freescale's 8555CDS board is very similar to 8548CDS  
> board. I just wonder what exactly the differences are. can I just  
> put the 8555CDS BSP onto the 8548CDS board?
>
> Thanks  in advance,

The 8555 u-boot is different from the 8548 u-boot.  There are also  
differences in the device-tree (I'm not sure what version of the  
kernel is in the BSP, so I can't say for sure).  Recent versions of  
the Linux kernel merged all of the CDS systems into one kernel.

As for the differences, off the top of my head:

* 8555 vs 8548 chip
* PCI slot on the carrier card is PCI on 8555, PCIe on 8548.
* 8548 has 4 eTSECs, 8555 has 2 TSECs (and the # of ethernet ports  
reflects this).

Andy

^ permalink raw reply

* Re: Problems on porting linux 2.6 to xilinx ML410
From: Grant Likely @ 2007-08-24 18:21 UTC (permalink / raw)
  To: woyuzhilei; +Cc: linuxppc-embedded
In-Reply-To: <200708241111513591137@163.com>

On 8/23/07, woyuzhilei <woyuzhilei@163.com> wrote:
>
>
> Hey:
>     Recently I'm doing some project on Xilinx Ml410 evaluation board.The
> first step is porting linux 2.6 to ml410,but I got some problems on this,and
> my project cann't proceed,so I come to you for some help.
>     I use the linux kernel source tree download from
> http://git.secretlab.ca/git/linux-2.6-virtex.git  (The
> latest one).Add the file the xparameters.h,xparameter_ml40x.h.  Then add
> arch/ppc/platforms/4xx/xilinx_generic_ppc.c(Use the patch I
> get here),and change it's name to ml40x.c,then I make some necessay change
> of the configuration files to accept selecting a Ml40x type board.

You shouldn't need to do all of this; only the xparameters_ml40x.h
file is needed.  To get started, I'd just use replace
xparameters_ml403.h with your custom xparameters_ml40x.h and use the
ml403 board port.  Once you've got it booting, you can get more fancy.

> Then
> compile the kernel,and get the image file from arch/ppc/boot/images, (On
> kernel compiling,the only device driver  I sellect is " 8250/16550 and
> compatible serial support ").After that  I download the zImage.elf file to
> the target board,and run it.But there is no output from the serial port at
> all.Am I doing somthing wrong?I really don't know goes wrong.
>     Can anyone here help me with this?Any help from you is appreciated.Thank
> you very much!

Make sure you've got 16550 console support enabled and
'console=/dev/ttyS0' in your kernel command line.

Cheers,
g.

-- 
Grant Likely, B.Sc., P.Eng.
Secret Lab Technologies Ltd.
grant.likely@secretlab.ca
(403) 399-0195

^ permalink raw reply

* Re: RFC: issues concerning the next NAPI interface
From: Jan-Bernd Themann @ 2007-08-24 18:11 UTC (permalink / raw)
  To: James Chapman
  Cc: Thomas Klein, Jan-Bernd Themann, Stefan Roscher, netdev,
	linux-kernel, Christoph Raisch, linux-ppc, akepner, Marcus Eder,
	Stephen Hemminger
In-Reply-To: <46CF127D.1090609@katalix.com>

James Chapman schrieb:
> Stephen Hemminger wrote:
>> On Fri, 24 Aug 2007 17:47:15 +0200
>> Jan-Bernd Themann <ossthema@de.ibm.com> wrote:
>>
>>> Hi,
>>>
>>> On Friday 24 August 2007 17:37, akepner@sgi.com wrote:
>>>> On Fri, Aug 24, 2007 at 03:59:16PM +0200, Jan-Bernd Themann wrote:
>>>>> .......
>>>>> 3) On modern systems the incoming packets are processed very fast. 
>>>>> Especially
>>>>>    on SMP systems when we use multiple queues we process only a 
>>>>> few packets
>>>>>    per napi poll cycle. So NAPI does not work very well here and 
>>>>> the interrupt    rate is still high. What we need would be some 
>>>>> sort of timer polling mode    which will schedule a device after a 
>>>>> certain amount of time for high load    situations. With high 
>>>>> precision timers this could work well. Current
>>>>>    usual timers are too slow. A finer granularity would be needed 
>>>>> to keep the
>>>>>    latency down (and queue length moderate).
>>>>>
>>>> We found the same on ia64-sn systems with tg3 a couple of years 
>>>> ago. Using simple interrupt coalescing ("don't interrupt until 
>>>> you've received N packets or M usecs have elapsed") worked 
>>>> reasonably well in practice. If your h/w supports that (and I'd 
>>>> guess it does, since it's such a simple thing), you might try it.
>>>>
>>> I don't see how this should work. Our latest machines are fast 
>>> enough that they
>>> simply empty the queue during the first poll iteration (in most cases).
>>> Even if you wait until X packets have been received, it does not 
>>> help for
>>> the next poll cycle. The average number of packets we process per 
>>> poll queue
>>> is low. So a timer would be preferable that periodically polls the 
>>> queue, without the need of generating a HW interrupt. This would 
>>> allow us
>>> to wait until a reasonable amount of packets have been received in 
>>> the meantime
>>> to keep the poll overhead low. This would also be useful in combination
>>> with LRO.
>>>
>>
>> You need hardware support for deferred interrupts. Most devices have 
>> it (e1000, sky2, tg3)
>> and it interacts well with NAPI. It is not a generic thing you want 
>> done by the stack,
>> you want the hardware to hold off interrupts until X packets or Y 
>> usecs have expired.
>
> Does hardware interrupt mitigation really interact well with NAPI? In 
> my experience, holding off interrupts for X packets or Y usecs does 
> more harm than good; such hardware features are useful only when the 
> OS has no NAPI-like mechanism.
>
> When tuning NAPI drivers for packets/sec performance (which is a good 
> indicator of driver performance), I make sure that the driver stays in 
> NAPI polled mode while it has any rx or tx work to do. If the CPU is 
> fast enough that all work is always completed on each poll, I have the 
> driver stay in polled mode until dev->poll() is called N times with no 
> work being done. This keeps interrupts disabled for reasonable traffic 
> levels, while minimizing packet processing latency. No need for 
> hardware interrupt mitigation.
Yes, that was one idea as well. But the problem with that is that 
net_rx_action will call
the same poll function over and over again in a row if there are no 
further network
devices. The problem about this approach is that you always poll just a 
very few packets
each time. This does not work with LRO well, as there are no packets to 
aggregate...
So it would make more sense to wait for a certain time before trying it 
again.
Second problem: after the jiffies incremented by one in net_rx_action 
(after some poll rounds), net_rx_action will quit and return control to 
the softIRQ handler. The poll function
is called again as the softIRQ handler thinks there is more work to be 
done. So even
then we do not wait... After some rounds in the softIRQ handler, we 
finally wait some time.

>
>> The parameters for controlling it are already in ethtool, the issue 
>> is finding a good
>> default set of values for a wide range of applications and 
>> architectures. Maybe some
>> heuristic based on processor speed would be a good starting point. 
>> The dynamic irq
>> moderation stuff is not widely used because it is too hard to get right.
>
> I agree. It would be nice to find a way for the typical user to derive 
> best values for these knobs for his/her particular system. Perhaps a 
> tool using pktgen and network device phy internal loopback could be 
> developed?
>

^ permalink raw reply

* Re: RFC: issues concerning the next NAPI interface
From: Shirley Ma @ 2007-08-24 17:45 UTC (permalink / raw)
  To: Linas Vepstas
  Cc: Thomas Klein, Jan-Bernd Themann, Stefan Roscher, netdev,
	linux-kernel, Christoph Raisch, netdev-owner, linux-ppc, akepner,
	Eder, Jan-Bernd Themann, Stephen Hemminger, Marcus
In-Reply-To: <20070824165110.GH4282@austin.ibm.com>

> Just to be clear, in the previous email I posted on this thread, I
> described a worst-case network ping-pong test case (send a packet, wait
> for reply), and found out that a deffered interrupt scheme just damaged
> the performance of the test case. 

When splitting rx and tx handler, I found some performance gain by 
deffering interrupt scheme in tx not rx in IPoIB driver.

Shirley

^ permalink raw reply

* Re: RFC: issues concerning the next NAPI interface
From: Rick Jones @ 2007-08-24 17:07 UTC (permalink / raw)
  To: Linas Vepstas
  Cc: Thomas Klein, Jan-Bernd Themann, Stefan Roscher, netdev,
	linux-kernel, Christoph Raisch, linux-ppc, Jan-Bernd Themann,
	Eder, akepner, Stephen Hemminger, Marcus
In-Reply-To: <20070824165110.GH4282@austin.ibm.com>

> Just to be clear, in the previous email I posted on this thread, I
> described a worst-case network ping-pong test case (send a packet, wait
> for reply), and found out that a deffered interrupt scheme just damaged
> the performance of the test case.  Since the folks who came up with the
> test case were adamant, I turned off the defferred interrupts.  
> While defferred interrupts are an "obvious" solution, I decided that 
> they weren't a good solution. (And I have no other solution to offer).

Sounds exactly like the default netperf TCP_RR test and any number of other 
benchmarks.  The "send  a request, wait for reply, send next request, etc etc 
etc" is a rather common application behaviour afterall.

rick jones

^ permalink raw reply

* Re: RFC: issues concerning the next NAPI interface
From: Linas Vepstas @ 2007-08-24 16:51 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Thomas Klein, Jan-Bernd Themann, netdev, linux-kernel,
	Christoph Raisch, linux-ppc, Jan-Bernd Themann, Eder, akepner,
	Stefan Roscher, Marcus
In-Reply-To: <20070824085203.42f4305c@freepuppy.rosehill.hemminger.net>

On Fri, Aug 24, 2007 at 08:52:03AM -0700, Stephen Hemminger wrote:
> 
> You need hardware support for deferred interrupts. Most devices have it (e1000, sky2, tg3)
> and it interacts well with NAPI. It is not a generic thing you want done by the stack,
> you want the hardware to hold off interrupts until X packets or Y usecs have expired.

Just to be clear, in the previous email I posted on this thread, I
described a worst-case network ping-pong test case (send a packet, wait
for reply), and found out that a deffered interrupt scheme just damaged
the performance of the test case.  Since the folks who came up with the
test case were adamant, I turned off the defferred interrupts.  
While defferred interrupts are an "obvious" solution, I decided that 
they weren't a good solution. (And I have no other solution to offer).

--linas

^ permalink raw reply

* Re: RFC: issues concerning the next NAPI interface
From: David Stevens @ 2007-08-24 16:50 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Thomas Klein, Jan-Bernd Themann, netdev, linux-kernel,
	Christoph Raisch, netdev-owner, linux-ppc, akepner, Marcus Eder,
	Jan-Bernd Themann, Stefan Roscher
In-Reply-To: <20070824085203.42f4305c@freepuppy.rosehill.hemminger.net>

Stephen Hemminger <shemminger@linux-foundation.org> wrote on 08/24/2007 
08:52:03 AM:

> 
> You need hardware support for deferred interrupts. Most devices have it 
> (e1000, sky2, tg3)
> and it interacts well with NAPI. It is not a generic thing you want done 
by the stack,
> you want the hardware to hold off interrupts until X packets or Y usecs 
have expired.

        For generic hardware that doesn't support it, couldn't you use an 
estimater
and adjust the timer dynamicly in software based on sampled values? Switch 
to per-packet
interrupts when the receive rate is low...
        Actually, that's how I thought NAPI worked before I found out 
otherwise (ie,
before I looked :-)).

        The hardware-accelerated one is essentially siloing as done by 
ancient serial
devices on UNIX systems. If you had a tunable for a target count, and an 
estimator
for the time interval, then switch to per-packet when the estimator 
exceeds a tunable
max threshold (and also, I suppose, if you near overflowing the ring on 
the min
timer granularity), you get almost all of it, right?
        Problem is if it increases rapidly, you may drop packets before 
you notice
that the ring is full in the current estimated interval.

 +-DLS

^ permalink raw reply

* Re: RFC: issues concerning the next NAPI interface
From: Linas Vepstas @ 2007-08-24 16:45 UTC (permalink / raw)
  To: Jan-Bernd Themann
  Cc: Thomas Klein, Jan-Bernd Themann, netdev, linux-kernel, linux-ppc,
	Christoph Raisch, Marcus Eder, Stefan Roscher
In-Reply-To: <200708241559.17055.ossthema@de.ibm.com>

On Fri, Aug 24, 2007 at 03:59:16PM +0200, Jan-Bernd Themann wrote:
> 3) On modern systems the incoming packets are processed very fast. Especially
>    on SMP systems when we use multiple queues we process only a few packets
>    per napi poll cycle. So NAPI does not work very well here and the interrupt 
>    rate is still high. 

I saw this too, on a system that is "modern" but not terribly fast, and
only slightly (2-way) smp. (the spidernet)

I experimented wih various solutions, none were terribly exciting.  The
thing that killed all of them was a crazy test case that someone sprung on
me:  They had written a worst-case network ping-pong app: send one
packet, wait for reply, send one packet, etc.  

If I waited (indefinitely) for a second packet to show up, the test case 
completely stalled (since no second packet would ever arrive).  And if I 
introduced a timer to wait for a second packet, then I just increased 
the latency in the response to the first packet, and this was noticed, 
and folks complained.  

In the end, I just let it be, and let the system work as a busy-beaver, 
with the high interrupt rate. Is this a wise thing to do?  I was
thinking that, if the system is under heavy load, then the interrupt
rate would fall, since (for less pathological network loads) more 
packets would queue up before the poll was serviced.  But I did not
actually measure the interrupt rate under heavy load ... 

--linas

^ permalink raw reply

* Re: [linux kernel 2.6.19] ethernet driver for Marvell bridge GT-64260
From: Dale Farnsworth @ 2007-08-24 16:13 UTC (permalink / raw)
  To: joachim.bader, linuxppc-embedded
In-Reply-To: <OF2C2E8C6F.3860730F-ONC1257341.004B3D4B-C1257341.004C2882@diehl-gruppe.de>

> I continue the work of ThomasB to get a linux kernel 2.6.19 up and running
> on a PPC750FX based platform using a GT-64260.
> From Thomas I got a patched mv643xx_eth driver which is ported to support
> the 64260, too.
> Now the driver is integrated and runs complete through initialisation
> bringing up an eth0 device.
> The initialisation seams to be ok (no error messages), the device is up
> and routing information is setup.
> Now I try to ping a remote station but I receive nothing and I get no
> feedback from the remote compter.
> As soon as I turn of the net interface (ifconfig eth0 down) I get the
> message "Tx time out or no link?)
> Is there anything else I can check? Any idea what may cause this problem?

Steven Hill posted his patch series merging gt64260 ethernet support to
the netdev mailing list last month.  You might look at:

	http://lists.openwall.net/netdev/2007/07/19/22

and related patches.

The differences between 64260 and 64360 are so large that they are
unlikely to be merged into a single driver, but a separate driver
is a possibility if someone steps up to do the work.

-Dale

^ permalink raw reply

* 8555CDS BSP on 8548CDS board
From: mike zheng @ 2007-08-24 16:02 UTC (permalink / raw)
  To: linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 207 bytes --]

 Hi,

I was told Freescale's 8555CDS board is very similar to 8548CDS board. I
just wonder what exactly the differences are. can I just put the 8555CDS BSP
onto the 8548CDS board?

Thanks  in advance,

Mike

[-- Attachment #2: Type: text/html, Size: 356 bytes --]

^ permalink raw reply

* 8555CDS BSP on 8548CDS board
From: mike zheng @ 2007-08-24 16:00 UTC (permalink / raw)
  To: linuxppc-embedded

[-- Attachment #1: Type: text/plain, Size: 206 bytes --]

Hi,

I was told Freescale's 8555CDS board is very similar to 8548CDS board. I
just wonder what exactly the differences are. can I just put the 8555CDS BSP
onto the 8548CDS board?

Thanks  in advance,

Mike

[-- Attachment #2: Type: text/html, Size: 310 bytes --]

^ permalink raw reply

* Re: gdbserver and c++
From: khollan @ 2007-08-24 15:57 UTC (permalink / raw)
  To: linuxppc-embedded
In-Reply-To: <12282029.post@talk.nabble.com>


Any suggestions Im completely lost

-- 
View this message in context: http://www.nabble.com/gdbserver-and-c%2B%2B-tf4313806.html#a12315137
Sent from the linuxppc-embedded mailing list archive at Nabble.com.

^ permalink raw reply

* RE: Only one phy can be accessed through ioctls to a socket (patch available)
From: DI BACCO ANTONIO - technolabs @ 2007-08-24 15:56 UTC (permalink / raw)
  To: Andy Fleming; +Cc: linuxppc-embedded
In-Reply-To: <B999A34D-92BD-41D1-852A-45C4E2258712@freescale.com>


> I'd certainly be interested in seeing it.

There is not much to see, only few lines in phy_device.c:

/* get_existing_phy_device:
 *
 * description: returns a phy device with the given address
 * if it exists
 */
static int phy_compare_addr(struct device *dev, void *data)
{
        return (*((int*)data) =3D=3D to_phy_device(dev)->addr) ? 1 : 0;
}
struct phy_device * get_existing_phy_device(int addr)
{
        struct bus_type *bus =3D &mdio_bus_type;
        struct phy_device *phydev;

        struct device *d;

        /* Search the list of PHY devices on the mdio bus for the
         * PHY with the requested name */
        d =3D bus_find_device(bus, NULL, (void *) &addr,
phy_compare_addr);

        if (d)
    {
                phydev =3D to_phy_device(d);
        return phydev;
        }

        return NULL;

}
EXPORT_SYMBOL(get_existing_phy_device);


And a small change in fs_enet-main.c :


static int fs_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
{
        struct fs_enet_private *fep =3D netdev_priv(dev);
        struct mii_ioctl_data *mii =3D (struct mii_ioctl_data
*)&rq->ifr_data;
    struct phy_device* phydev =3D fep->phydev;
        unsigned long flags;
        int rc;

        if (!netif_running(dev) && (phydev->addr =3D=3D mii->phy_id))
                return -EINVAL;

    if ((phydev->addr !=3D mii->phy_id))
    {
      struct phy_device* d;

      if ((d =3D get_existing_phy_device(mii->phy_id)) !=3D NULL)
        phydev =3D d;
      else
        return -ENODEV;
    }

        spin_lock_irqsave(&fep->lock, flags);
        rc =3D phy_mii_ioctl(phydev, mii, cmd);
        spin_unlock_irqrestore(&fep->lock, flags);
        return rc;
}

^ permalink raw reply

* Re: RFC: issues concerning the next NAPI interface
From: Stephen Hemminger @ 2007-08-24 15:52 UTC (permalink / raw)
  To: Jan-Bernd Themann
  Cc: Thomas Klein, Marcus, Jan-Bernd Themann, netdev, linux-kernel,
	Christoph Raisch, linux-ppc, akepner, Eder, Stefan Roscher
In-Reply-To: <200708241747.16592.ossthema@de.ibm.com>

On Fri, 24 Aug 2007 17:47:15 +0200
Jan-Bernd Themann <ossthema@de.ibm.com> wrote:

> Hi,
>=20
> On Friday 24 August 2007 17:37, akepner@sgi.com wrote:
> > On Fri, Aug 24, 2007 at 03:59:16PM +0200, Jan-Bernd Themann wrote:
> > > .......
> > > 3) On modern systems the incoming packets are processed very fast. Es=
pecially
> > > =C2=A0 =C2=A0on SMP systems when we use multiple queues we process on=
ly a few packets
> > > =C2=A0 =C2=A0per napi poll cycle. So NAPI does not work very well her=
e and the interrupt=20
> > > =C2=A0 =C2=A0rate is still high. What we need would be some sort of t=
imer polling mode=20
> > > =C2=A0 =C2=A0which will schedule a device after a certain amount of t=
ime for high load=20
> > > =C2=A0 =C2=A0situations. With high precision timers this could work w=
ell. Current
> > > =C2=A0 =C2=A0usual timers are too slow. A finer granularity would be =
needed to keep the
> > >    latency down (and queue length moderate).
> > >=20
> >=20
> > We found the same on ia64-sn systems with tg3 a couple of years=20
> > ago. Using simple interrupt coalescing ("don't interrupt until=20
> > you've received N packets or M usecs have elapsed") worked=20
> > reasonably well in practice. If your h/w supports that (and I'd=20
> > guess it does, since it's such a simple thing), you might try=20
> > it.
> >=20
>=20
> I don't see how this should work. Our latest machines are fast enough tha=
t they
> simply empty the queue during the first poll iteration (in most cases).
> Even if you wait until X packets have been received, it does not help for
> the next poll cycle. The average number of packets we process per poll qu=
eue
> is low. So a timer would be preferable that periodically polls the=20
> queue, without the need of generating a HW interrupt. This would allow us
> to wait until a reasonable amount of packets have been received in the me=
antime
> to keep the poll overhead low. This would also be useful in combination
> with LRO.
>=20

You need hardware support for deferred interrupts. Most devices have it (e1=
000, sky2, tg3)
and it interacts well with NAPI. It is not a generic thing you want done by=
 the stack,
you want the hardware to hold off interrupts until X packets or Y usecs hav=
e expired.

The parameters for controlling it are already in ethtool, the issue is find=
ing a good
default set of values for a wide range of applications and architectures. M=
aybe some
heuristic based on processor speed would be a good starting point. The dyna=
mic irq
moderation stuff is not widely used because it is too hard to get right.

--=20
Stephen Hemminger <shemminger@linux-foundation.org>

^ permalink raw reply

* Re: little endian page mapping on PQ3
From: David Hawkins @ 2007-08-24 15:49 UTC (permalink / raw)
  To: Jose Almeida; +Cc: Joyeau Sylvain, Linuxppc-embedded
In-Reply-To: <46CED851.70002@sysgo.fr>

Hi Jose,

> I want to do using an mmap() entry point in a driver, in order to map 
> this to the user. Of course in that case ioremap() does not work.
> 
> Any Clue ?
> 

I used the little-endian flag on the Yosemite board (440EP)
to test what the flag did.

http://www.ovro.caltech.edu/~dwh/correlator/pdf/LNX-762-Hawkins.pdf
http://www.ovro.caltech.edu/~dwh/correlator/software/driver_design.tar.gz

Look at the mmap function in pci_io.c.

	/* PowerPC endian control
	 * - default is cleared, big-endian
	 */
#ifdef _PAGE_ENDIAN
	if (bar->little_endian) {
		pgprot_val(vma->vm_page_prot) |= _PAGE_ENDIAN;
	} else {
		pgprot_val(vma->vm_page_prot) &= ~_PAGE_ENDIAN;
	}
	if (pgprot_val(vma->vm_page_prot) & _PAGE_ENDIAN) {
		LOG_DEBUG("_PAGE_ENDIAN is set\n");
	} else {
		LOG_DEBUG("_PAGE_ENDIAN is not set\n");
	}
#endif

It might be the same for the PQ3 ... at least it'll be
pretty similar.

Dave

^ permalink raw reply

* Re: RFC: issues concerning the next NAPI interface
From: Jan-Bernd Themann @ 2007-08-24 15:47 UTC (permalink / raw)
  To: akepner
  Cc: Thomas Klein, Jan-Bernd Themann, netdev, linux-kernel, linux-ppc,
	Christoph Raisch, Marcus Eder, Stefan Roscher
In-Reply-To: <20070824153703.GN5592@sgi.com>

Hi,

On Friday 24 August 2007 17:37, akepner@sgi.com wrote:
> On Fri, Aug 24, 2007 at 03:59:16PM +0200, Jan-Bernd Themann wrote:
> > .......
> > 3) On modern systems the incoming packets are processed very fast. Espe=
cially
> > =A0 =A0on SMP systems when we use multiple queues we process only a few=
 packets
> > =A0 =A0per napi poll cycle. So NAPI does not work very well here and th=
e interrupt=20
> > =A0 =A0rate is still high. What we need would be some sort of timer pol=
ling mode=20
> > =A0 =A0which will schedule a device after a certain amount of time for =
high load=20
> > =A0 =A0situations. With high precision timers this could work well. Cur=
rent
> > =A0 =A0usual timers are too slow. A finer granularity would be needed t=
o keep the
> >    latency down (and queue length moderate).
> >=20
>=20
> We found the same on ia64-sn systems with tg3 a couple of years=20
> ago. Using simple interrupt coalescing ("don't interrupt until=20
> you've received N packets or M usecs have elapsed") worked=20
> reasonably well in practice. If your h/w supports that (and I'd=20
> guess it does, since it's such a simple thing), you might try=20
> it.
>=20

I don't see how this should work. Our latest machines are fast enough that =
they
simply empty the queue during the first poll iteration (in most cases).
Even if you wait until X packets have been received, it does not help for
the next poll cycle. The average number of packets we process per poll queue
is low. So a timer would be preferable that periodically polls the=20
queue, without the need of generating a HW interrupt. This would allow us
to wait until a reasonable amount of packets have been received in the mean=
time
to keep the poll overhead low. This would also be useful in combination
with LRO.

Regards,
Jan-Bernd

^ permalink raw reply

* Re: RFC: issues concerning the next NAPI interface
From: akepner @ 2007-08-24 15:37 UTC (permalink / raw)
  To: Jan-Bernd Themann
  Cc: Thomas Klein, Jan-Bernd Themann, netdev, linux-kernel, linux-ppc,
	Christoph Raisch, Marcus Eder, Stefan Roscher
In-Reply-To: <200708241559.17055.ossthema@de.ibm.com>

On Fri, Aug 24, 2007 at 03:59:16PM +0200, Jan-Bernd Themann wrote:
> .......
> 3) On modern systems the incoming packets are processed very fast. Especially
>    on SMP systems when we use multiple queues we process only a few packets
>    per napi poll cycle. So NAPI does not work very well here and the interrupt 
>    rate is still high. What we need would be some sort of timer polling mode 
>    which will schedule a device after a certain amount of time for high load 
>    situations. With high precision timers this could work well. Current
>    usual timers are too slow. A finer granularity would be needed to keep the
>    latency down (and queue length moderate).
> 

We found the same on ia64-sn systems with tg3 a couple of years 
ago. Using simple interrupt coalescing ("don't interrupt until 
you've received N packets or M usecs have elapsed") worked 
reasonably well in practice. If your h/w supports that (and I'd 
guess it does, since it's such a simple thing), you might try 
it.

-- 
Arthur

^ permalink raw reply

* Re: [PATCH 05/20] bootwrapper: flatdevtree fixes
From: Scott Wood @ 2007-08-24 14:48 UTC (permalink / raw)
  To: linuxppc-dev, paulus
In-Reply-To: <20070824010122.GA7281@localhost.localdomain>

On Fri, Aug 24, 2007 at 11:01:22AM +1000, David Gibson wrote:
> On Thu, Aug 23, 2007 at 12:48:30PM -0500, Scott Wood wrote:
> > It's likely to be ugly no matter what, though I'll try to come up with 
> > something slightly nicer.  If I were doing this code from scratch, I'd 
> > probably liven the tree first and reflatten it to pass to the kernel.
> 
> Eh, probably not worth bothering doing an actual implementation at
> this stage - I'll have to redo it for libfdt anyway.

Too late, I already wrote it -- it wasn't as bad as I thought it would
be.

> flatdevtree uses some of the information it caches in the phandle
> context stuff to remember who's the parent of a node.  libfdt uses raw
> offsets into the structure, so the *only* way to implement
> get_parent() is to rescan the dt from the beginning, keeping track of
> parents until reaching the given node.

What is the benefit of doing it that way?

-Scott

^ permalink raw reply

* [linux kernel 2.6.19] ethernet driver for Marvell bridge GT-64260
From: joachim.bader @ 2007-08-24 13:51 UTC (permalink / raw)
  To: linuxppc-embedded

[-- Attachment #1: Type: text/plain, Size: 1475 bytes --]

Hello everybody,

I continue the work of ThomasB to get a linux kernel 2.6.19 up and running 
on a PPC750FX based platform using a GT-64260.
From Thomas I got a patched mv643xx_eth driver which is ported to support 
the 64260, too.
Now the driver is integrated and runs complete through initialisation 
bringing up an eth0 device.
The initialisation seams to be ok (no error messages), the device is up 
and routing information is setup.
Now I try to ping a remote station but I receive nothing and I get no 
feedback from the remote compter.
As soon as I turn of the net interface (ifconfig eth0 down) I get the 
message "Tx time out or no link?)
Is there anything else I can check? Any idea what may cause this problem?

Thanks a lot for your help

Joachim

_______________________________________________________________________________________________________________________
Der Inhalt dieser E-Mail ist für den Absender rechtlich nicht verbindlich. 
Informieren Sie uns bitte, wenn Sie diese E-Mail fälschlicherweise erhalten haben (Fax: +49-69-5805-1399). Bitte löschen Sie in diesem Fall die Nachricht. Jede Form der weiteren Benutzung ist untersagt.

The content of this e-mail is not legally binding upon the sender.
If this e-mail was transmitted to you by error, then please inform us accordingly (Fax: +49-69-5805-1399). In such case you are requested to erase the message. Any use of such e-mail message is strictly prohibited.

[-- Attachment #2: Type: text/html, Size: 2375 bytes --]

^ permalink raw reply

* RFC: issues concerning the next NAPI interface
From: Jan-Bernd Themann @ 2007-08-24 13:59 UTC (permalink / raw)
  To: netdev
  Cc: Thomas Klein, Jan-Bernd Themann, linux-kernel, linux-ppc,
	Christoph Raisch, Marcus Eder, Stefan Roscher

Hi,

when I tried to get the eHEA driver working with the new interface,
the following issues came up.

1) The current implementation of netif_rx_schedule, netif_rx_complete
=A0 =A0and the net_rx_action have the following problem: netif_rx_schedule
=A0 =A0sets the NAPI_STATE_SCHED flag and adds the NAPI instance to the pol=
l_list.
=A0 =A0netif_rx_action checks NAPI_STATE_SCHED, if set it will add the devi=
ce
=A0 =A0to the poll_list again (as well). netif_rx_complete clears the NAPI_=
STATE_SCHED.
=A0 =A0If an interrupt handler calls netif_rx_schedule on CPU 2
=A0 =A0after netif_rx_complete has been called on CPU 1 (and the poll funct=
ion=20
=A0 =A0has not returned yet), the NAPI instance will be added twice to the=
=20
=A0 =A0poll_list (by netif_rx_schedule and net_rx_action). Problems occur w=
hen=20
=A0 =A0netif_rx_complete is called twice for the device (BUG() called)

2) If an ethernet chip supports multiple receive queues, the queues are=20
=A0 =A0currently all processed on the CPU where the interrupt comes in. This
=A0 =A0is because netif_rx_schedule will always add the rx queue to the CPU=
's
=A0 =A0napi poll_list. The result under heavy presure is that all queues wi=
ll
=A0 =A0gather on the weakest CPU (with highest CPU load) after some time as=
 they
=A0 =A0will stay there as long as the entire queue is emptied. On SMP syste=
ms=20
=A0 =A0this behaviour is not desired. It should also work well without inte=
rrupt
=A0 =A0pinning.
=A0 =A0It would be nice if it is possible to schedule queues to other CPU's=
, or
=A0 =A0at least to use interrupts to put the queue to another cpu (not nice=
 for=20
=A0 =A0as you never know which one you will hit).=20
=A0 =A0I'm not sure how bad the tradeoff would be.

3) On modern systems the incoming packets are processed very fast. Especial=
ly
=A0 =A0on SMP systems when we use multiple queues we process only a few pac=
kets
=A0 =A0per napi poll cycle. So NAPI does not work very well here and the in=
terrupt=20
=A0 =A0rate is still high. What we need would be some sort of timer polling=
 mode=20
=A0 =A0which will schedule a device after a certain amount of time for high=
 load=20
=A0 =A0situations. With high precision timers this could work well. Current
=A0 =A0usual timers are too slow. A finer granularity would be needed to ke=
ep the
   latency down (and queue length moderate).

What do you think?

Thanks,
Jan-Bernd

^ permalink raw reply

* Re: [PATCH] Remove barriers from the SLB shadow buffer update
From: Josh Boyer @ 2007-08-23 20:36 UTC (permalink / raw)
  To: Michael Neuling; +Cc: linuxppc-dev, paulus
In-Reply-To: <3992.1187938717@neuling.org>

On Fri, 2007-08-24 at 16:58 +1000, Michael Neuling wrote:
> After talking to an IBM POWER hypervisor design and development (PHYP)
> guy, there seems to be no need for memory barriers when updating the SLB
> shadow buffer provided we only update it from the current CPU, which we
> do.
> 
> Also, these guys see no need in the future for these barriers.

Does this result in a significant performance gain?  I'm just curious.

josh

^ permalink raw reply

* Re: little endian page mapping on PQ3
From: Clemens Koller @ 2007-08-24 13:19 UTC (permalink / raw)
  To: Jose Almeida; +Cc: Linuxppc-embedded
In-Reply-To: <46CE8FD5.2060609@sysgo.fr>

Hi, Jose!

Jose Almeida schrieb:
> Hi all,
> 
> Looking at PQ3 documentation, it looks like there is a way to select
> on a page basis if we would like to map one particular page in BIG or
> LITTLE endian.
> This is a very nice feature when you need to exchange some data
> between a PC and a PQ3 target.

I would be interested, however cannot spend time right now to work
on that subject.

> I am wondering if someone have already tryed this PQ3 feature ?
> I guess this would require some kind of hook in the kernel ...

Just some random bits I found in the web:
http://developer.apple.com/documentation/Hardware/DeviceManagers/pci_srvcs/pci_cards_drivers/PCI_BOOK.250.html

The Interesting part is:
"Thus, the address swizzle is completely transparent to software."

So, I would just try to setup some memory mapping and turn on little endian
mode to access that area... MMMV. Just a guess.

Regards,
-- 
Clemens Koller
__________________________________
R&D Imaging Devices
Anagramm GmbH
Rupert-Mayer-Straße 45/1
Linhof Werksgelände
D-81379 München
Tel.089-741518-50
Fax 089-741518-19
http://www.anagramm-technology.com

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox