From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?UTF-8?B?TmVib2rFoWEgxIZvc2nEhw==?= Subject: Re: UDP jitter Date: Wed, 6 Nov 2013 12:57:09 +0100 Message-ID: <20131106125709.2e091182@sth491dt.servo.net> References: <20130429222238.2b440d8c@sanja.asnn.org> <517FE1A5.1090702@osadl.org> <20130430192653.5c6c08b6@sanja.asnn.org> <12C1B74BDFD05D40B2356A9B12DFA33967BEB68107@KEBMXSPMB01.keba.co.at> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: linux-rt-users To: eg Engleder Gerhard Return-path: Received: from smtprelay-b12.telenor.se ([62.127.194.21]:56490 "EHLO smtprelay-b12.telenor.se" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932179Ab3KFMZt convert rfc822-to-8bit (ORCPT ); Wed, 6 Nov 2013 07:25:49 -0500 Received: from ipb4.telenor.se (ipb4.telenor.se [195.54.127.167]) by smtprelay-b12.telenor.se (Postfix) with ESMTP id 01008E8567 for ; Wed, 6 Nov 2013 12:57:20 +0100 (CET) In-Reply-To: <12C1B74BDFD05D40B2356A9B12DFA33967BEB68107@KEBMXSPMB01.keba.co.at> Sender: linux-rt-users-owner@vger.kernel.org List-ID: > Hello Neboj=C5=A1a, >=20 > I have a similar problem now with 3.2.51-rt72. Did > you find any solution? >=20 > regards, gerhard >=20 > > -----Urspr=C3=BCngliche Nachricht----- > > Von: linux-rt-users-owner@vger.kernel.org=20 > > [mailto:linux-rt-users-owner@vger.kernel.org] Im Auftrag von=20 > > Neboj=C5=A1a Cosic > > Gesendet: Dienstag, 30. April 2013 19:27 > > An: Carsten Emde > > Cc: linux-rt-users > > Betreff: Re: UDP jitter > >=20 > >=20 > > > Hi Neboj=C5=A1a, > > Hi Carsten > > >=20 > > > > I am doing some work on a product running kernel 2.6.33.7.2-rt3= 0. > > > > Applications running on this kernel are a bit specific,=20 > > meaning that=20 > > > > there are a number of threads running on a different priorities= =2E > > > > For a several months I was haunted with spurious jitter,=20 > > detected on=20 > > > > UDP messages - multicast UDP messages where received on=20 > > originating=20 > > > > node without any delay, but on other nodes a delay in=20 > > range of 10s=20 > > > > of milliseconds was detected. Simply, it looked like a=20 > > message was=20 > > > > stuck in kernel before finally getting transmitted. > > > > Finally, thanks to LTTng tool, I was able to locate the=20 > > problem down=20 > > > > to this peace of code in net/sched/sch_generic.c: > > > > > > > > int sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q, > > > > struct net_device *dev, struct=20 > > netdev_queue *txq, > > > > spinlock_t *root_lock) { > > > > int ret =3D NETDEV_TX_BUSY; > > > > > > > > /* And release qdisc */ > > > > spin_unlock(root_lock); > > > > > > > > HARD_TX_LOCK(dev, txq); > > > > > > > > if (!netif_tx_queue_stopped(txq) &&=20 > > !netif_tx_queue_frozen(txq)) > > > > ret =3D dev_hard_start_xmit(skb, dev, txq); > > > > > > > > > > > > HARD_TX_UNLOCK(dev, txq); > > > > > > > > spin_lock(root_lock); > > > > ... > > > > > > > > When transmit queue is empty, thread wanting to send a=20 > > message comes=20 > > > > directly to sch_direct_xmit, without changing context. It then=20 > > > > releases spin lock, and than takes another. So far so good. > > > > If this starting thread is of lower priority, it can be=20 > > preempted by=20 > > > > another thread, while still being in dev_hard_start_xmit functi= on=20 > > > > This thread will check if HARD_TX_LOCK is taken, and if so, go = on=20 > > > > and queue its own message. > > > > If there are enough higher priority tasks, tx can be stalled=20 > > > > indefinitely. [..] > > > Did you increase the priority of the related sirq-net-tx and=20 > > > sirq-net-rx kernel threads appropriately? Some more details on=20 > > > enabling real-time Ethernet are given here ->=20 > > https://www.osadl.org/?id=3D930. > > Thanks for the link, I was aware of it. > > I did try to increase sirq-net-tx and rx, even to get tx=20 > > higher than rx (in case incoming traffic was creating=20 > > problems), but it didn't make any difference.=20 > > I was trying to isolate problem by running iperf, but it=20 > > worked perfectly well when run on it's own. No wonder,=20 > > because it generates traffic from the same priority, and to=20 > > trigger this behaviour, one need traffic from at least two=20 > > levels of priority, and a busy CPU (so that low priority=20 > > thread can get blocked in driver for a noticeable period of time ). > > I suppose that running two iperf processes at different=20 > > priorities can demonstrate the problem. > >=20 > > >=20 > > > -Carsten. > > > -- > > > To unsubscribe from this list: send the line "unsubscribe=20 > > > linux-rt-users" in the body of a message to=20 > > majordomo@vger.kernel.org=20 > > > More majordomo info at http://vger.kernel.org/majordomo-info.htm= l > >=20 > > -- > > Neboj=C5=A1a > > -- > > To unsubscribe from this list: send the line "unsubscribe=20 > > linux-rt-users" in the body of a message to=20 > > majordomo@vger.kernel.org More majordomo info at =20 > > http://vger.kernel.org/majordomo-info.html > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rt-us= ers" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html You can try with this patch. I am quite sure that same problem persists on all newer kernels (I am using 2.6.33), but never had a time to creat= e simple test to prove it. Index: net/sched/sch_generic.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- net/sched/sch_generic.c (revision 1709) +++ net/sched/sch_generic.c (revision 1710) @@ -120,16 +120,18 @@ int ret =3D NETDEV_TX_BUSY; =20 /* And release qdisc */ - spin_unlock(root_lock); +/* spin_unlock(root_lock); =20 HARD_TX_LOCK(dev, txq); +*/ if (!netif_tx_queue_stopped(txq) && !netif_tx_queue_frozen(txq)) ret =3D dev_hard_start_xmit(skb, dev, txq);=20 +/* HARD_TX_UNLOCK(dev, txq); =20 spin_lock(root_lock);