Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: Traffic shaping - class ID 16bit limit?
From: Miroslav Kratochvil @ 2011-08-25 17:06 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20110825093937.2a8a1457@nehalam.ftrdhcpuser.net>

>> Technically the ClassID seems to be "hardcoded" as a 16bit value, but
>> after some source searching, I haven't found any good reason for it to
>> be 16-bit only.
>
> Granted it was a poor choice in the initial design.
> It is wired into the API and changing it would be quite painful.
>

I was feeling something like that would come.

If I get it correctly, the API change would consist of:

- some netlink protocol change
- slight modification of qdisc_class_hash
- modifications in all (four?) hierarchical schedulers
- tiny expansion of userspace tc utility

which isn't that painful (except for the CBQ part), but I'm probably
missing something, and presumably the change would take some time to
get mainstream -- probably way more time than writing a hfsc clone
that is controlled using some other interface than tc/netlink. :(

(but hey! I have a topic for school work!)

> You might be able to do the same thing by splitting traffic
> into multiple virtual devices (dummy or ifb) and then doing
> another layer.
>

My scenario looks pretty simple, mostly like a big hashing filter
attached at the device root, flowid'ing the stuff to leaf classes.
Could you please provide some simple illustration of splitting that
into multiple devices? I guess that the main problem with this
approach would be that my subclasses usually don't share anything in
common, especially not any pretty IP prefixes that would allow good
splitting.

Anyway, thanks very much for response!

-mk

^ permalink raw reply

* Re: Traffic shaping - class ID 16bit limit?
From: Stephen Hemminger @ 2011-08-25 16:39 UTC (permalink / raw)
  To: Miroslav Kratochvil; +Cc: netdev
In-Reply-To: <CAO0uZ+-fv89Z3-9+vh5kN93xe=Uw8b=PSfnAqosOjUBP6PcVNg@mail.gmail.com>

On Thu, 25 Aug 2011 18:28:01 +0200
Miroslav Kratochvil <exa.exa@gmail.com> wrote:

> Technically the ClassID seems to be "hardcoded" as a 16bit value, but
> after some source searching, I haven't found any good reason for it to
> be 16-bit only.

Granted it was a poor choice in the initial design.
It is wired into the API and changing it would be quite painful.

You might be able to do the same thing by splitting traffic
into multiple virtual devices (dummy or ifb) and then doing
another layer.

^ permalink raw reply

* Traffic shaping - class ID 16bit limit?
From: Miroslav Kratochvil @ 2011-08-25 16:28 UTC (permalink / raw)
  To: netdev

Hello everyone,

the question is simple: What should I do if I need to have more than
2^16 subclasses of a classful queuing discipline (in, say, hfsc or
htb)?

I bumped into this problem while writing some kind of traffic shaping
software and thinking about scalability. As there still are other ways
to have more than 64k "classes" (like grouping some subclasses into
separate qdiscs), those ways have significant drawbacks (require more
tc-filter rules and decisions, generally more processing power, and
the structure is quite hard to maintain).

Technically the ClassID seems to be "hardcoded" as a 16bit value, but
after some source searching, I haven't found any good reason for it to
be 16-bit only.

I understand that those ID's are usually handled together with another
16bit Qdisc ID, which would add up to a quite big number (possibly
unpleasant on some architectures) if those were both 32bit.

I also completely understand that in most cases of common usage
there's absolutely no need to have this big amount of subclasses, but
on the other hand there's still no reason to have "64k classes enough
for everyone". :D

Of course if there's some obvious method to solve this, or a patch, or
some kind of workaround that I haven't found, please let me know about
it, I will happily use it.

Thanks for any suggestions,
Mirek Kratochvil

^ permalink raw reply

* [PATCH net-next] net_sched: sfb: optimize enqueue on full queue
From: Eric Dumazet @ 2011-08-25 16:21 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

In case SFB queue is full (hard limit reached), there is no point
spending time to compute hash and maximum qlen/p_mark.

We instead just early drop packet.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/sched/sch_sfb.c |   13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/net/sched/sch_sfb.c b/net/sched/sch_sfb.c
index 0a833d0..e83c272 100644
--- a/net/sched/sch_sfb.c
+++ b/net/sched/sch_sfb.c
@@ -287,6 +287,12 @@ static int sfb_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	u32 r, slot, salt, sfbhash;
 	int ret = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
 
+	if (unlikely(sch->q.qlen >= q->limit)) {
+		sch->qstats.overlimits++;
+		q->stats.queuedrop++;
+		goto drop;
+	}
+
 	if (q->rehash_interval > 0) {
 		unsigned long limit = q->rehash_time + q->rehash_interval;
 
@@ -332,12 +338,9 @@ static int sfb_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	slot ^= 1;
 	sfb_skb_cb(skb)->hashes[slot] = 0;
 
-	if (unlikely(minqlen >= q->max || sch->q.qlen >= q->limit)) {
+	if (unlikely(minqlen >= q->max)) {
 		sch->qstats.overlimits++;
-		if (minqlen >= q->max)
-			q->stats.bucketdrop++;
-		else
-			q->stats.queuedrop++;
+		q->stats.bucketdrop++;
 		goto drop;
 	}
 

^ permalink raw reply related

* Re: [RFC] per-containers tcp buffer limitation
From: Stephen Hemminger @ 2011-08-25 15:44 UTC (permalink / raw)
  To: Chris Friesen
  Cc: Daniel Wagner, Eric W. Biederman, KAMEZAWA Hiroyuki,
	Glauber Costa, Linux Containers, netdev, David Miller,
	Pavel Emelyanov
In-Reply-To: <4E5664B5.6000806@genband.com>

You seem to have forgotten the work of your forefathers. When appealing
to history you must understand it first.

What about using netfilter (with extensions)? We already have iptables
module to match on uid or gid. It wouldn't be hard to extend this to
other bits of meta data like originating and target containers.

You could also use this to restrict access to ports and hosts on
a per container basis.

^ permalink raw reply

* Re: linux-next: build failure after merge of the staging tree
From: Greg KH @ 2011-08-25 15:39 UTC (permalink / raw)
  To: Larry Finger
  Cc: Stephen Rothwell, linux-next, linux-kernel, wlanfae, Jiri Pirko,
	David Miller, netdev
In-Reply-To: <4E55DA8C.10102@lwfinger.net>

On Thu, Aug 25, 2011 at 12:15:56AM -0500, Larry Finger wrote:
> On 08/25/2011 12:02 AM, Stephen Rothwell wrote:
> >Hi Greg,
> >
> >After merging the staging tree, today's linux-next build (x86_64
> >allmodconfig) failed like this:
> >
> >drivers/staging/rtl8192e/rtl_core.c:2917:2: error: unknown field 'ndo_set_multicast_list' specified in initializer
> >
> >Caused by commit 94a799425eee ("From: wlanfae<wlanfae@realtek.com>" -
> >really "[PATCH 1/8] rtl8192e: Import new version of driver from realtek"
> >Larry, that patch was badly imported ...) interacting with commit
> >b81693d9149c ("net: remove ndo_set_multicast_list callback") from the net
> >tree.
> >
> >I applied the following patch (which seems to be what was done to the
> >other drivers in the net tree - there is probably more required):
> >
> >From: Stephen Rothwell<sfr@canb.auug.org.au>
> >Date: Thu, 25 Aug 2011 14:57:55 +1000
> >Subject: [PATCH] rtl8192e: update for ndo_set_multicast_list removal.
> >
> >Signed-off-by: Stephen Rothwell<sfr@canb.auug.org.au>
> >---
> >  drivers/staging/rtl8192e/rtl_core.c |    2 +-
> >  1 files changed, 1 insertions(+), 1 deletions(-)
> >
> >diff --git a/drivers/staging/rtl8192e/rtl_core.c b/drivers/staging/rtl8192e/rtl_core.c
> >index f8a13d9..b38f626 100644
> >--- a/drivers/staging/rtl8192e/rtl_core.c
> >+++ b/drivers/staging/rtl8192e/rtl_core.c
> >@@ -2914,7 +2914,7 @@ static const struct net_device_ops rtl8192_netdev_ops = {
> >  	.ndo_stop = rtl8192_close,
> >  	.ndo_tx_timeout = rtl8192_tx_timeout,
> >  	.ndo_do_ioctl = rtl8192_ioctl,
> >-	.ndo_set_multicast_list = r8192_set_multicast,
> >+	.ndo_set_rx_mode = r8192_set_multicast,
> >  	.ndo_set_mac_address = r8192_set_mac_adr,
> >  	.ndo_validate_addr = eth_validate_addr,
> >  	.ndo_change_mtu = eth_change_mtu,
> 
> Stephan,
> 
> Thanks for the notice. It seems that commit b81693d9149c had not
> made it into my copy of staging. I'll look into the issue.

It wouldn't ever make it there, as that's coming from the net-next tree,
so this will have to wait until stuff merges together in Linus's tree.

thanks,

greg k-h

^ permalink raw reply

* RE: [PATCH 3/9] IB: nes: convert to SKB paged frag API.
From: Latif, Faisal @ 2011-08-25 15:33 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Roland Dreier, Hefty, Sean, Hal Rosenstock,
	linux-rdma@vger.kernel.org, netdev@vger.kernel.org
In-Reply-To: <1314260895-15936-3-git-send-email-ian.campbell@citrix.com>



Acked-by: Faisal Latif <faisal.latif@intel.com>

Thanks.

> ---
>  drivers/infiniband/hw/nes/nes_nic.c |   21 +++++++++++----------
>  1 files changed, 11 insertions(+), 10 deletions(-)

^ permalink raw reply

* Re: [RFC PATCH v2 0/9] bql: Byte Queue Limits
From: Tom Herbert @ 2011-08-25 15:29 UTC (permalink / raw)
  To: Johannes Berg; +Cc: jhs, davem, netdev
In-Reply-To: <1312809524.4372.29.camel@jlt3.sipsolutions.net>

> Well, the wireless case is curious, and has a whole bunch of corner
> cases, since it's not necessarily PtP, it can be PtMP!
>
> But considering the most basic case of us being a client connecting to
> an AP first: yes, the bandwidth will change dynamically, I don't know
> what impact this has on BQL, Tom, maybe you can think about this a bit?
>
BQL is dynamic, and will increase the queue limit more aggressively
than decrease it.  So for instance, we can track the largest queue
needed over 30 seconds which should be stable in the presence even in
the presence of fluctuating bandwidth.  The thing that worries me is
rather the HW queues conform to the queue characteristics described in
the patch.  If transmit completions are random and not regular, BQL
probably can't function well.

If you'd like to bring this up on some wireless devices that would be
great, I don't have easy access to any right now, but I can try to
help otherwise.


> The second big challenge in wireless is the PtMP case: if we're acting
> as an AP, then we typically have four queues for any number of remote
> endpoints with varying bandwidth. I haven't found a good way to handle
> this, we can't have hardware queues per station (most HW is simply not
> capable of that many queues) but technically we would want to make the
> queue limits depend on the peer...
>
> Since I just returned from vacation I have tons of email to dig through
> I'll have to keep this short for now, but I'm definitely interested.
>
> johannes
>
>

^ permalink raw reply

* [PATCH] cassini: init before use in cas_interruptN.
From: Francois Romieu @ 2011-08-25 15:02 UTC (permalink / raw)
  To: Thomas Jarosch; +Cc: netdev, David S. Miller
In-Reply-To: <201108251558.45290.thomas.jarosch@intra2net.com>

Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Spotted-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
---

 David, any opinion regarding the removal of the USE_NAPI #ifdef
 in this driver ?

 drivers/net/cassini.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/drivers/net/cassini.c b/drivers/net/cassini.c
index 646c86b..fdb7a17 100644
--- a/drivers/net/cassini.c
+++ b/drivers/net/cassini.c
@@ -2452,14 +2452,13 @@ static irqreturn_t cas_interruptN(int irq, void *dev_id)
 	struct net_device *dev = dev_id;
 	struct cas *cp = netdev_priv(dev);
 	unsigned long flags;
-	int ring;
+	int ring = (irq == cp->pci_irq_INTC) ? 2 : 3;
 	u32 status = readl(cp->regs + REG_PLUS_INTRN_STATUS(ring));
 
 	/* check for shared irq */
 	if (status == 0)
 		return IRQ_NONE;
 
-	ring = (irq == cp->pci_irq_INTC) ? 2 : 3;
 	spin_lock_irqsave(&cp->lock, flags);
 	if (status & INTR_RX_DONE_ALT) { /* handle rx separately */
 #ifdef USE_NAPI
-- 
1.7.4.4

^ permalink raw reply related

* Re: [RFC PATCH v2 0/9] bql: Byte Queue Limits
From: Tom Herbert @ 2011-08-25 15:19 UTC (permalink / raw)
  To: jhs; +Cc: davem, netdev, Johannes Berg
In-Reply-To: <1312808784.17202.39.camel@mojatatu>

> For wired connections I think the big deal is in improved
> runtime memory saving (your perf numbers are kinda ok).
> The challenge is going to be with wireless where the underlying
> bandwidth changes (and therefore the optimal queue size varies
> more frequently). The problem with active queue management is
> getting the feedback loop to be more accurate and i think there
> will be challenges with wired devices.

The important characteristic (for us at least) will be reduced latency
for high priority packets (for NICs that don't support qos multiQ.  I
do have data showing those benefits, but it's a little old.  I will
have something to present at LPC.

> I notice that you dont have any wireless devices;
> but it would be nice for someone to check this out on wireless.
> CCing Johannes - maybe he has some insight.
>
Yeah, these scare me ;-)

> cheers,
> jamal
>
>
>
>

^ permalink raw reply

* Re: [RFC] per-containers tcp buffer limitation
From: Chris Friesen @ 2011-08-25 15:05 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: Eric W. Biederman, KAMEZAWA Hiroyuki, Glauber Costa,
	Linux Containers, netdev, David Miller, Pavel Emelyanov
In-Reply-To: <4E56464B.4070304@monom.org>

On 08/25/2011 06:55 AM, Daniel Wagner wrote:

> I'd like to solve a use case where it is necessary to count all bytes
> transmitted and received by an application [1]. So far I have found two
> unsatisfying solution for it. The first one is to hook into libc and
> count the bytes there. I don't think I have to say I don't like this.

Is there any particular reason you can't use LD_PRELOAD to interpose a 
library to do the statistics monitoring?

Chris

-- 
Chris Friesen
Software Developer
GENBAND
chris.friesen@genband.com
www.genband.com

^ permalink raw reply

* Re: [PATCH 2/9] IB: amso1100: convert to SKB paged frag API.
From: Steve Wise @ 2011-08-25 14:42 UTC (permalink / raw)
  To: Ian Campbell
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, Tom Tucker, Roland Dreier,
	Sean Hefty, Hal Rosenstock, linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1314260895-15936-2-git-send-email-ian.campbell-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>

Acked-by: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* cassini driver: Use of uninitialized memory
From: Thomas Jarosch @ 2011-08-25 13:58 UTC (permalink / raw)
  To: netdev

Hello,

the interrupt routine of the cassini driver
currently looks like this:

----------------------
static irqreturn_t cas_interruptN(int irq, void *dev_id)
{
	struct net_device *dev = dev_id;
	struct cas *cp = netdev_priv(dev);
	unsigned long flags;
	int ring;
	u32 status = readl(cp->regs + REG_PLUS_INTRN_STATUS(ring));
...
----------------------

-> "ring" isn't initialized properly and gets used
in REG_PLUS_INTRN_STATUS. Some lines below there's this:

----------------------
	ring = (irq == cp->pci_irq_INTC) ? 2 : 3;
----------------------

Should that line be moved before the readl() call
or should "ring" be initialized with zero?

Credit for spotting this goes to cppcheck.

Cheers,
Thomas

^ permalink raw reply

* Re: iwlagn: Random "Time out reading EEPROM".
From: Nicolas de Pesloüan @ 2011-08-25 13:50 UTC (permalink / raw)
  To: wwguy
  Cc: dhalperi-GmWTxIRN22iJaUV4rX00uodd74u8MsAO@public.gmane.org,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, wireless
In-Reply-To: <1310740111.13897.13.camel@wwguy-ubuntu>

Le 15/07/2011 16:28, wwguy a écrit :

> the error indicate fail to read data from EEPROM, your 2nd report is
> even more strange, the number at the end the error message indicate the
> index of DWORD driver trying to read from EEPROM.
>
> "Time out reading EEPROM[2]" telling me the first 2 DWORD is reading ok
> but not the 3rd read.
>
> How many PCI-E slots you have in your system, could it possible for you
> to switch to another PCI-E slot, or pull out and re-insert the NIC.

Unfortunately, not. On this laptop, the NIC is not reachable without disassembling the laptop, and I 
don't want to... I will double check again, but...

> Also, it is possible to put the NIC into different system and see if you
> are seeing the similar problem?

No, for the exact same reason.

Not that it still happens with 3.0.0-1 from Debian.

[   15.086244] iwlagn: Intel(R) Wireless WiFi Link AGN driver for Linux, in-tree:
[   15.086247] iwlagn: Copyright(c) 2003-2011 Intel Corporation
[   15.086404] iwlagn 0000:05:00.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
[   15.086412] iwlagn 0000:05:00.0: setting latency timer to 64
[   15.086438] iwlagn 0000:05:00.0: Detected Intel(R) WiFi Link 5100 AGN, REV=0x54
[   15.095859] iwlagn 0000:05:00.0: Time out reading EEPROM[6]
[   15.095945] iwlagn 0000:05:00.0: Unable to init EEPROM
[   15.096030] iwlagn 0000:05:00.0: PCI INT A disabled
[   15.096039] iwlagn: probe of 0000:05:00.0 failed with error -110

modprobe -r iwlagn ; modprobe iwlagn

[  231.822492] iwlagn: Intel(R) Wireless WiFi Link AGN driver for Linux, in-tree:
[  231.822495] iwlagn: Copyright(c) 2003-2011 Intel Corporation
[  231.822581] iwlagn 0000:05:00.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
[  231.822591] iwlagn 0000:05:00.0: setting latency timer to 64
[  231.822621] iwlagn 0000:05:00.0: Detected Intel(R) WiFi Link 5100 AGN, REV=0x54
[  231.843544] iwlagn 0000:05:00.0: device EEPROM VER=0x11f, CALIB=0x4
[  231.843546] iwlagn 0000:05:00.0: Device SKU: 0Xb
[  231.844889] iwlagn 0000:05:00.0: Tunable channels: 13 802.11bg, 24 802.11a channels
[  231.844961] iwlagn 0000:05:00.0: irq 50 for MSI/MSI-X
[  231.989424] iwlagn 0000:05:00.0: loaded firmware version 8.83.5.1 build 33692
[  232.037456] ieee80211 phy0: Selected rate control algorithm 'iwl-agn-rs'

The error is not easy to reproduce, but the fix is perfectly stable. A single unload/reload of 
iwlagn is always enough to solve the problem, when it happens. For this reason, it sounds difficult 
to consider this a hardware slot problem. Can't this be related to some other PCI components?

00:00.0 Host bridge: Intel Corporation Mobile 4 Series Chipset Memory Controller Hub (rev 07)
00:01.0 PCI bridge: Intel Corporation Mobile 4 Series Chipset PCI Express Graphics Port (rev 07)
00:1a.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 (rev 03)
00:1a.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #5 (rev 03)
00:1a.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #6 (rev 03)
00:1a.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 (rev 03)
00:1b.0 Audio device: Intel Corporation 82801I (ICH9 Family) HD Audio Controller (rev 03)
00:1c.0 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 1 (rev 03)
00:1c.1 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 2 (rev 03)
00:1c.2 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 3 (rev 03)
00:1c.3 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 4 (rev 03)
00:1c.4 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 5 (rev 03)
00:1c.5 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 6 (rev 03)
00:1d.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 03)
00:1d.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 03)
00:1d.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 (rev 03)
00:1d.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 03)
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev 93)
00:1f.0 ISA bridge: Intel Corporation ICH9M LPC Interface Controller (rev 03)
00:1f.2 SATA controller: Intel Corporation ICH9M/M-E SATA AHCI Controller (rev 03)
00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 03)
01:00.0 VGA compatible controller: nVidia Corporation G98 [GeForce 9300M GS] (rev a1)
02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8055 PCI-E Gigabit Ethernet Controller 
(rev 14)
05:00.0 Network controller: Intel Corporation WiFi Link 5100
09:03.0 FireWire (IEEE 1394): Ricoh Co Ltd R5C832 IEEE 1394 Controller (rev 05)
09:03.1 SD Host controller: Ricoh Co Ltd R5C822 SD/SDIO/MMC/MS/MSPro Host Adapter (rev 22)
09:03.2 System peripheral: Ricoh Co Ltd R5C592 Memory Stick Bus Host Adapter (rev 12)

I'm quite sure I can fix this problem by loading, unloading and reloading iwlagn on every startup... 
but I don't really consider this a fix :-/

	Nicolas.
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: iwlagn: Random "Time out reading EEPROM".
From: Guy, Wey-Yi @ 2011-08-25 13:37 UTC (permalink / raw)
  To: Nicolas de Pesloüan
  Cc: dhalperi-GmWTxIRN22iJaUV4rX00uodd74u8MsAO@public.gmane.org,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, wireless
In-Reply-To: <4E565333.3080007-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

On Thu, 2011-08-25 at 06:50 -0700, Nicolas de Pesloüan wrote:
> Le 15/07/2011 16:28, wwguy a écrit :
> 
> > the error indicate fail to read data from EEPROM, your 2nd report is
> > even more strange, the number at the end the error message indicate the
> > index of DWORD driver trying to read from EEPROM.
> >
> > "Time out reading EEPROM[2]" telling me the first 2 DWORD is reading ok
> > but not the 3rd read.
> >
> > How many PCI-E slots you have in your system, could it possible for you
> > to switch to another PCI-E slot, or pull out and re-insert the NIC.
> 
> Unfortunately, not. On this laptop, the NIC is not reachable without disassembling the laptop, and I 
> don't want to... I will double check again, but...
> 
> > Also, it is possible to put the NIC into different system and see if you
> > are seeing the similar problem?
> 
> No, for the exact same reason.
> 
> Not that it still happens with 3.0.0-1 from Debian.
> 
> [   15.086244] iwlagn: Intel(R) Wireless WiFi Link AGN driver for Linux, in-tree:
> [   15.086247] iwlagn: Copyright(c) 2003-2011 Intel Corporation
> [   15.086404] iwlagn 0000:05:00.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
> [   15.086412] iwlagn 0000:05:00.0: setting latency timer to 64
> [   15.086438] iwlagn 0000:05:00.0: Detected Intel(R) WiFi Link 5100 AGN, REV=0x54
> [   15.095859] iwlagn 0000:05:00.0: Time out reading EEPROM[6]
> [   15.095945] iwlagn 0000:05:00.0: Unable to init EEPROM
> [   15.096030] iwlagn 0000:05:00.0: PCI INT A disabled
> [   15.096039] iwlagn: probe of 0000:05:00.0 failed with error -110
> 
> modprobe -r iwlagn ; modprobe iwlagn
> 
> [  231.822492] iwlagn: Intel(R) Wireless WiFi Link AGN driver for Linux, in-tree:
> [  231.822495] iwlagn: Copyright(c) 2003-2011 Intel Corporation
> [  231.822581] iwlagn 0000:05:00.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
> [  231.822591] iwlagn 0000:05:00.0: setting latency timer to 64
> [  231.822621] iwlagn 0000:05:00.0: Detected Intel(R) WiFi Link 5100 AGN, REV=0x54
> [  231.843544] iwlagn 0000:05:00.0: device EEPROM VER=0x11f, CALIB=0x4
> [  231.843546] iwlagn 0000:05:00.0: Device SKU: 0Xb
> [  231.844889] iwlagn 0000:05:00.0: Tunable channels: 13 802.11bg, 24 802.11a channels
> [  231.844961] iwlagn 0000:05:00.0: irq 50 for MSI/MSI-X
> [  231.989424] iwlagn 0000:05:00.0: loaded firmware version 8.83.5.1 build 33692
> [  232.037456] ieee80211 phy0: Selected rate control algorithm 'iwl-agn-rs'
> 
> The error is not easy to reproduce, but the fix is perfectly stable. A single unload/reload of 
> iwlagn is always enough to solve the problem, when it happens. For this reason, it sounds difficult 
> to consider this a hardware slot problem. Can't this be related to some other PCI components?
> 
> 00:00.0 Host bridge: Intel Corporation Mobile 4 Series Chipset Memory Controller Hub (rev 07)
> 00:01.0 PCI bridge: Intel Corporation Mobile 4 Series Chipset PCI Express Graphics Port (rev 07)
> 00:1a.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 (rev 03)
> 00:1a.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #5 (rev 03)
> 00:1a.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #6 (rev 03)
> 00:1a.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 (rev 03)
> 00:1b.0 Audio device: Intel Corporation 82801I (ICH9 Family) HD Audio Controller (rev 03)
> 00:1c.0 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 1 (rev 03)
> 00:1c.1 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 2 (rev 03)
> 00:1c.2 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 3 (rev 03)
> 00:1c.3 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 4 (rev 03)
> 00:1c.4 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 5 (rev 03)
> 00:1c.5 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 6 (rev 03)
> 00:1d.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 03)
> 00:1d.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 03)
> 00:1d.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 (rev 03)
> 00:1d.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 03)
> 00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev 93)
> 00:1f.0 ISA bridge: Intel Corporation ICH9M LPC Interface Controller (rev 03)
> 00:1f.2 SATA controller: Intel Corporation ICH9M/M-E SATA AHCI Controller (rev 03)
> 00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 03)
> 01:00.0 VGA compatible controller: nVidia Corporation G98 [GeForce 9300M GS] (rev a1)
> 02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8055 PCI-E Gigabit Ethernet Controller 
> (rev 14)
> 05:00.0 Network controller: Intel Corporation WiFi Link 5100
> 09:03.0 FireWire (IEEE 1394): Ricoh Co Ltd R5C832 IEEE 1394 Controller (rev 05)
> 09:03.1 SD Host controller: Ricoh Co Ltd R5C822 SD/SDIO/MMC/MS/MSPro Host Adapter (rev 22)
> 09:03.2 System peripheral: Ricoh Co Ltd R5C592 Memory Stick Bus Host Adapter (rev 12)
> 
> I'm quite sure I can fix this problem by loading, unloading and reloading iwlagn on every startup... 
> but I don't really consider this a fix :

not sure how to help since it is not easy to re-produce and it is EEPROM
reading problem, I only can guess it might related to the physical
device.

Thanks
Wey


--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: TCP port firewall controlled by UDP packets
From: Pavel Machek @ 2011-08-25 13:19 UTC (permalink / raw)
  To: Tonda; +Cc: davem, kuznet, jmorris, yoshfuji, kaber, netdev, linux-kernel
In-Reply-To: <1313106172-18455-1-git-send-email-as@strmilov.cz>

No comments, variables named in czech.

Ok for me but...

But first thing would be description what it is good for...?

							Pavel

On Fri 2011-08-12 01:42:52, Tonda wrote:
>  	  If unsure, say N.
> +
> +config TCPFIREWALL
> +	tristate "TCP Firewall controlled by UDP queries"
> +	depends on m
> diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile
> --- a/net/ipv4/Makefile
> +++ b/net/ipv4/Makefile
> @@ -51,3 +51,4 @@
>  
>  obj-$(CONFIG_XFRM) += xfrm4_policy.o xfrm4_state.o xfrm4_input.o \
>  		      xfrm4_output.o
> +obj-$(CONFIG_TCPFIREWALL) += tcpfirewall/
> diff --git a/net/ipv4/tcpfirewall/Makefile b/net/ipv4/tcpfirewall/Makefile
> --- a/net/ipv4/tcpfirewall/Makefile
> +++ b/net/ipv4/tcpfirewall/Makefile
> @@ -0,0 +1 @@
> +obj-$(CONFIG_TCPFIREWALL) += tcpfirewall.o
> diff --git a/net/ipv4/tcpfirewall/tcpfirewall.c b/net/ipv4/tcpfirewall/tcpfirewall.c
> --- a/net/ipv4/tcpfirewall/tcpfirewall.c
> +++ b/net/ipv4/tcpfirewall/tcpfirewall.c
> @@ -0,0 +1,451 @@
> +#include <linux/module.h>
> +#include <linux/kernel.h>
> +#include <linux/init.h>
> +#include <linux/skbuff.h>
> +#include <linux/in.h>
> +#include <linux/if_packet.h>
> +#include <linux/tcp.h>
> +#include <linux/udp.h>
> +#include <net/tcp.h>
> +#include <net/udp.h>
> +
> +struct net_protocol {
> +	int (*handler)(struct sk_buff *skb);
> +	void (*err_handler)(struct sk_buff *skb, u32 info);
> +	int (*gso_send_check)(struct sk_buff *skb);
> +	struct sk_buff *(*gso_segment)(struct sk_buff *skb,
> +		u32 features);
> +	struct sk_buff **(*gro_receive)(struct sk_buff **head,
> +		struct sk_buff *skb);
> +	int (*gro_complete)(struct sk_buff *skb);
> +	unsigned int no_policy:1,
> +		netns_ok:1;
> +};
> +
> +MODULE_LICENSE("GPL");
> +
> +static unsigned long inet_protos = 0x01234567;
> +
> +struct net_protocol **_inet_protos;
> +
> +module_param(inet_protos, ulong, 0);
> +
> +static int *otviraky;
> +static int *zaviraky;
> +
> +static int pocetotviraku;
> +static int pocetzaviraku;
> +static int stav;
> +static int packetcounter;
> +static int tcpport;
> +static int open;
> +static int firewall;
> +
> +int (*tcpv4recv) (struct sk_buff *skb);
> +int (*udprecv) (struct sk_buff *skb);
> +
> +int udpcontroller(struct sk_buff *skb)
> +{
> +	const struct udphdr *uh;
> +
> +	if (skb->pkt_type != PACKET_HOST) {
> +		kfree_skb(skb);
> +		return 0;
> +	}
> +
> +	if (!pskb_may_pull(skb, sizeof(struct tcphdr))) {
> +		kfree_skb(skb);
> +		return 0;
> +	}
> +
> +	uh = udp_hdr(skb);
> +
> +	if (pocetotviraku == 0)
> +		return udprecv(skb);
> +
> +	if (!open) {
> +		if (uh->dest == otviraky[stav]) {
> +			++stav;
> +			packetcounter = 0;
> +
> +			if (stav == pocetotviraku) {
> +				open = 1;
> +				stav = 0;
> +			}
> +		} else {
> +			if (packetcounter <= 16) {
> +				++packetcounter;
> +				if (packetcounter > 16)
> +					stav = 0;
> +			}
> +		}
> +	} else {
> +		if (uh->dest == zaviraky[stav]) {
> +			++stav;
> +			packetcounter = 0;
> +
> +			if (stav == pocetzaviraku) {
> +				open = 0;
> +				stav = 0;
> +			}
> +		} else {
> +			if (packetcounter <= 16) {
> +				++packetcounter;
> +				if (packetcounter > 16)
> +					stav = 0;
> +			}
> +		}
> +	}
> +
> +
> +	return udprecv(skb);
> +}
> +
> +int tcpfirewall(struct sk_buff *skb)
> +{
> +	const struct tcphdr *th;
> +
> +	if (skb->pkt_type != PACKET_HOST) {
> +		kfree_skb(skb);
> +		return 0;
> +	}
> +
> +	if (!pskb_may_pull(skb, sizeof(struct tcphdr))) {
> +		kfree_skb(skb);
> +		return 0;
> +	}
> +
> +	th = tcp_hdr(skb);
> +
> +	if (th->dest == tcpport) {
> +		if (firewall == 1 && !open) {
> +			/*tcpv4sendreset(NULL, skb);*/
> +			kfree_skb(skb);
> +			return 0;
> +		}
> +	}
> +
> +	return tcpv4recv(skb);
> +}
> +
> +static struct net_protocol *zalohatcp;
> +static struct net_protocol *zalohaudp;
> +static struct net_protocol mytcp;
> +static struct net_protocol myudp;
> +
> +static ssize_t show(struct kobject *kobj, struct attribute *attr, char *buffer)
> +{
> +	if (!strcmp(attr->name, "firewall")) {
> +		if (firewall)
> +			buffer[0] = '1';
> +		else
> +			buffer[0] = '0';
> +
> +		buffer[1] = '\n';
> +		return 2;
> +	}
> +
> +	if (!strcmp(attr->name, "tcpport")) {
> +		sprintf(buffer, "%d\n", ntohs(tcpport));
> +		return strlen(buffer)+1;
> +	}
> +
> +	if (!strcmp(attr->name, "openers")) {
> +		int i;
> +		char *znak;
> +		if (pocetotviraku == 0)
> +			return 0;
> +		buffer[0] = '\0';
> +		znak = kmalloc(10, GFP_KERNEL);
> +		for (i = 0; i < pocetotviraku; ++i) {
> +			sprintf(znak, "%d ", ntohs(otviraky[i]));
> +			strcat(buffer, znak);
> +		}
> +		kfree(znak);
> +		buffer[strlen(buffer)-1] = '\n';
> +		return strlen(buffer);
> +	}
> +
> +	if (!strcmp(attr->name, "closers")) {
> +		int i;
> +		char *znak;
> +		if (pocetzaviraku == 0)
> +			return 0;
> +		buffer[0] = '\0';
> +		znak = kmalloc(10, GFP_KERNEL);
> +		for (i = 0; i < pocetzaviraku; ++i) {
> +			sprintf(znak, "%d ", ntohs(zaviraky[i]));
> +			strcat(buffer, znak);
> +		}
> +		kfree(znak);
> +		buffer[strlen(buffer)-1] = '\n';
> +		return strlen(buffer);
> +	}
> +
> +	if (!strcmp(attr->name, "open")) {
> +		if (open)
> +			buffer[0] = '1';
> +		else
> +			buffer[0] = '0';
> +
> +		buffer[1] = '\n';
> +		return 2;
> +	}
> +
> +	if (!strcmp(attr->name, "state")) {
> +		sprintf(buffer, "%d\n", stav);
> +		return strlen(buffer)+1;
> +	}
> +
> +	if (!strcmp(attr->name, "counter")) {
> +		sprintf(buffer, "%d\n", packetcounter);
> +		return strlen(buffer)+1;
> +	}
> +
> +	return 0;
> +}
> +
> +static ssize_t store(struct kobject *kobj, struct attribute *attr,
> +	const char *buffer, size_t size)
> +{
> +	int i;
> +	char *cislo;
> +	if (!strcmp(attr->name, "firewall")) {
> +		if (size > 0 && buffer[0] == '1')
> +			firewall = 1;
> +		else
> +			firewall = 0;
> +		stav = 0;
> +		return size;
> +	}
> +
> +	if (!strcmp(attr->name, "tcpport")) {
> +		cislo = kmalloc(size+1, GFP_KERNEL);
> +		for (i = 0; i < size; ++i)
> +			cislo[i] = buffer[i];
> +		cislo[size] = '\0';
> +		if (kstrtoint(cislo, 10, &i) < 0)
> +			i = -1;
> +		if (i > 0 && i < 65536)
> +			tcpport = htons(i);
> +		kfree(cislo);
> +		stav = 0;
> +		return size;
> +	}
> +
> +	if (!strcmp(attr->name, "openers")) {
> +		int udpport, i;
> +		int *noveotviraky;
> +		int *stareotviraky;
> +		cislo = kmalloc(size+1, GFP_KERNEL);
> +		for (i = 0; i < size; ++i)
> +			cislo[i] = buffer[i];
> +		cislo[size] = '\0';
> +
> +		if (!strcmp(cislo, "reset") || !strcmp(cislo, "reset\n")) {
> +			if (pocetotviraku)
> +				kfree(otviraky);
> +			pocetotviraku = 0;
> +		}
> +
> +		if (kstrtoint(cislo, 10, &i) < 0)
> +			i = -1;
> +		kfree(cislo);
> +
> +		if (i > 0 && i < 65536 && (pocetotviraku == 0 ||
> +			otviraky[pocetotviraku-1] != i))
> +				udpport = htons(i);
> +		else
> +			return size;
> +
> +		if (pocetotviraku < 10) {
> +			noveotviraky = kmalloc((pocetotviraku+1)*sizeof(int),
> +				GFP_KERNEL);
> +
> +			for (i = 0; i < pocetotviraku; ++i)
> +				noveotviraky[i] = otviraky[i];
> +
> +			noveotviraky[pocetotviraku] = udpport;
> +			stareotviraky = otviraky;
> +			otviraky = noveotviraky;
> +			if (pocetotviraku)
> +				kfree(stareotviraky);
> +
> +			++pocetotviraku;
> +		}
> +		stav = 0;
> +		return size;
> +	}
> +
> +	if (!strcmp(attr->name, "closers")) {
> +		int udpport, i;
> +		int *novezaviraky;
> +		int *starezaviraky;
> +		cislo = kmalloc(size+1, GFP_KERNEL);
> +		for (i = 0; i < size; ++i)
> +			cislo[i] = buffer[i];
> +		cislo[size] = '\0';
> +
> +		if (!strcmp(cislo, "reset") || !strcmp(cislo, "reset\n")) {
> +			if (pocetzaviraku)
> +				kfree(zaviraky);
> +			pocetzaviraku = 0;
> +		}
> +
> +		if (kstrtoint(cislo, 10, &i) < 0)
> +			i = -1;
> +		kfree(cislo);
> +
> +		if (i > 0 && i < 65536 && (pocetzaviraku == 0 ||
> +			zaviraky[pocetzaviraku-1] != i))
> +				udpport = htons(i);
> +		else
> +			return size;
> +
> +		if (pocetzaviraku < 10) {
> +			novezaviraky = kmalloc((pocetzaviraku+1)*sizeof(int),
> +				GFP_KERNEL);
> +
> +			for (i = 0; i < pocetzaviraku; ++i)
> +				novezaviraky[i] = zaviraky[i];
> +
> +			novezaviraky[pocetzaviraku] = udpport;
> +			starezaviraky = zaviraky;
> +			zaviraky = novezaviraky;
> +			if (pocetzaviraku)
> +				kfree(starezaviraky);
> +
> +			++pocetzaviraku;
> +		}
> +		stav = 0;
> +		return size;
> +	}
> +
> +	if (!strcmp(attr->name, "open")) {
> +		if (size > 0 && buffer[0] == '1')
> +			open = 1;
> +		else
> +			open = 0;
> +
> +		stav = 0;
> +		return size;
> +	}
> +
> +	return 0;
> +}
> +
> +static const struct sysfs_ops so = {
> +	.show = show,
> +	.store = store,
> +};
> +
> +static struct kobj_type khid = {
> +	.sysfs_ops = &so,
> +};
> +
> +static struct kobject kobj;
> +
> +static const struct attribute fw = {
> +	.name = "firewall",
> +	.mode = S_IRWXU,
> +};
> +
> +static const struct attribute opn = {
> +	.name = "open",
> +	.mode = S_IRWXU,
> +};
> +
> +static const struct attribute tcpp = {
> +	.name = "tcpport",
> +	.mode = S_IRWXU,
> +};
> +
> +static const struct attribute openers = {
> +	.name = "openers",
> +	.mode = S_IRWXU,
> +};
> +
> +static const struct attribute closers = {
> +	.name = "closers",
> +	.mode = S_IRWXU,
> +};
> +
> +static const struct attribute stat = {
> +	.name = "state",
> +	.mode = S_IRUSR,
> +};
> +
> +static const struct attribute counte = {
> +	.name = "counter",
> +	.mode = S_IRUSR,
> +};
> +
> +static int __init start(void)
> +{
> +	if (inet_protos == 0x01234567) {
> +		printk(KERN_WARNING "inet_protos parameter was not");
> +		printk(KERN_WARNING " specified!\nread its value from");
> +		printk(KERN_WARNING " System_map file file, and insert");
> +		printk(KERN_WARNING " the module again!\n");
> +		return -1;
> +	}
> +
> +	pocetotviraku = 0;
> +	pocetzaviraku = 0;
> +	stav = -1;
> +	packetcounter = 0;
> +	tcpport = 0;
> +	open = 1;
> +	firewall = 0;
> +
> +	memset(&kobj, 0, sizeof(struct kobject));
> +
> +	_inet_protos = (struct net_protocol **)inet_protos;
> +
> +	kobject_init(&kobj, &khid);
> +	if (kobject_add(&kobj, NULL, "tcpfirewall") < 0)
> +		printk(KERN_ERR "kobject_add failed");
> +
> +	if (sysfs_create_file(&kobj, &fw) < 0)
> +		printk(KERN_ERR "sysfs_create_file failed");
> +	if (sysfs_create_file(&kobj, &opn) < 0)
> +		printk(KERN_ERR "sysfs_create_file failed");
> +	if (sysfs_create_file(&kobj, &tcpp) < 0)
> +		printk(KERN_ERR "sysfs_create_file failed");
> +	if (sysfs_create_file(&kobj, &openers) < 0)
> +		printk(KERN_ERR "sysfs_create_file failed");
> +	if (sysfs_create_file(&kobj, &closers) < 0)
> +		printk(KERN_ERR "sysfs_create_file failed");
> +	if (sysfs_create_file(&kobj, &stat) < 0)
> +		printk(KERN_ERR "sysfs_create_file failed");
> +	if (sysfs_create_file(&kobj, &counte) < 0)
> +		printk(KERN_ERR "sysfs_create_file failed");
> +
> +	zalohatcp = _inet_protos[IPPROTO_TCP];
> +	zalohaudp = _inet_protos[IPPROTO_UDP];
> +	mytcp = *zalohatcp;
> +	myudp = *zalohaudp;
> +	tcpv4recv = mytcp.handler;
> +	udprecv = myudp.handler;
> +	mytcp.handler = tcpfirewall;
> +	myudp.handler = udpcontroller;
> +	_inet_protos[IPPROTO_TCP] = &mytcp;
> +	_inet_protos[IPPROTO_UDP] = &myudp;
> +	return 0;
> +}
> +
> +static void konec(void)
> +{
> +	_inet_protos[IPPROTO_TCP] = zalohatcp;
> +	_inet_protos[IPPROTO_UDP] = zalohaudp;
> +
> +	if (pocetotviraku)
> +		kfree(otviraky);
> +	if (pocetzaviraku)
> +		kfree(zaviraky);
> +
> +	kobject_del(&kobj);
> +}
> +
> +module_init(start);
> +module_exit(konec);
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply

* Re: [PATCH net-next v5 1/2] af-packet: Added TPACKET_V3 headers.
From: chetan loke @ 2011-08-25 12:58 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20110824.194308.2024908890526228700.davem@davemloft.net>

On Wed, Aug 24, 2011 at 10:43 PM, David Miller <davem@davemloft.net> wrote:

> Applied.
>
> I would suggest, as a follow-up patch, we add some appropriate
> prefixes to these new datastructures added to if_packet.h as
> these are exposed to userspace.
>

Sure.

> For example "hdr_v1", "bd_ts", "bd_header_u", and "block_desc" are
> just asking for namespace conflicts with something other API in
> userspace or the user's own datastructures.
>

Then just to be consistent, I will prefix it with 'tpacket'.


thanks
Chetan Loke

^ permalink raw reply

* Re: [RFC] per-containers tcp buffer limitation
From: Daniel Wagner @ 2011-08-25 12:55 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Pavel Emelyanov, netdev-u79uwXL29TY76Z2rM5mHXA, Linux Containers,
	David Miller
In-Reply-To: <m14o16qlq1.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>

Hi

On 08/25/2011 04:16 AM, Eric W. Biederman wrote:
> KAMEZAWA Hiroyuki<kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>  writes:
>
>> On Wed, 24 Aug 2011 22:28:59 -0300
>> Glauber Costa<glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>  wrote:
>>
>>> On 08/24/2011 09:35 PM, Eric W. Biederman wrote:
>>>> Glauber Costa<glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>   writes:
>>> Hi Eric,
>>>
>>> Thanks for your attention.
>>>
>>> So, this that you propose was my first implementation. I ended up
>>> throwing it away after playing with it for a while.
>>>
>>> One of the first problems that arise from that, is that the sysctls are
>>> a tunable visible from inside the container. Those limits, however, are
>>> to be set from the outside world. The code is not much better than that
>>> either, and instead of creating new cgroup structures and linking them
>>> to the protocol, we end up doing it for net ns. We end up increasing
>>> structures just the same...
>
> You don't need to add a netns member to sockets.
>
> But I do agree that there are odd permission issues with using the
> existing sysctls and making them per namespace.
>
> However almost everything I have seen with memory limits I have found
> very strange.  They all seem like a very bad version of disabling memory
> over commits.

Please apply the same rules for not cursing my family no further then 
the 3rd generation for my idea:

I'd like to solve a use case where it is necessary to count all bytes 
transmitted and received by an application [1]. So far I have found two 
unsatisfying solution for it. The first one is to hook into libc and 
count the bytes there. I don't think I have to say I don't like this.

The second idea was to use the trick Google has used for Android [2]. 
They add a hook into __sock_sendmsg and __sock_recvmsg and then count 
the bytes per UID. To get this working all application have to use an 
unique UID. So not very nice either.

After reading a bit up on cgroup I think that would be the right place 
to count the traffic. Unfortunately, with net_cls I can count the 
outgoing traffic but not the incoming one. If I understood Glauber 
approach correctly adding some statistic counters would be easy to do. 
Of course I don't know the impact of this.

thanks,
daniel

[1] 
http://lists.freedesktop.org/archives/systemd-devel/2011-August/003093.html

[2] 
http://xf.iksaif.net/dev/android/android-2.6.29-to-2.6.32/0083-uidstat-Adding-uid-stat-driver-to-collect-network-st.patch

^ permalink raw reply

* [patch net-next-2.6] benet: remove bogus "unlikely" on vlan check
From: Jiri Pirko @ 2011-08-25 12:50 UTC (permalink / raw)
  To: netdev
  Cc: davem, eric.dumazet, sathya.perla, subbu.seetharaman,
	ajit.khaparde, ivecera

Use of unlikely in this place is wrong. Remove it.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
 drivers/net/ethernet/emulex/benet/be_main.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c
index fb2eda0..3d55b47 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -1139,7 +1139,7 @@ static void be_rx_compl_process(struct be_adapter *adapter,
 		skb->rxhash = rxcp->rss_hash;
 
 
-	if (unlikely(rxcp->vlanf))
+	if (rxcp->vlanf)
 		__vlan_hwaccel_put_tag(skb, rxcp->vlan_tag);
 
 	netif_receive_skb(skb);
@@ -1196,7 +1196,7 @@ static void be_rx_compl_process_gro(struct be_adapter *adapter,
 	if (adapter->netdev->features & NETIF_F_RXHASH)
 		skb->rxhash = rxcp->rss_hash;
 
-	if (unlikely(rxcp->vlanf))
+	if (rxcp->vlanf)
 		__vlan_hwaccel_put_tag(skb, rxcp->vlan_tag);
 
 	napi_gro_frags(&eq_obj->napi);
-- 
1.7.6

^ permalink raw reply related

* Re: how to distribute irqs of ixgbevf
From: J.Hwan Kim @ 2011-08-25 10:19 UTC (permalink / raw)
  To: netdev
In-Reply-To: <1314260481.2387.10.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>

On 2011년 08월 25일 17:21, Eric Dumazet wrote:
> Le jeudi 25 août 2011 à 17:07 +0900, J.Hwan Kim a écrit :
>> Hi, everyone
>>
>> The interrupts of my ixgbevf driver occurs only Core 0
>> although the user space "irqbalance" serivce is working.
>>
>> How can I distribute the interrupt of RX in ixgbevf to all cores?
>>
>> cat /proc/interrupts | grep "isv"
>>     97:          8          0          0          0          0
>> 0          0          0   PCI-MSI-edge      isv0-rx-0
>>     99:          7          0          0          0          0
>> 0          0          0   PCI-MSI-edge      isv0:lsc
>>    103:       2059      0          0          0          0
>> 0          0          0   PCI-MSI-edge      isv2-rx-0
>>    104:         14        0          0          0          0
>> 0          0          0   PCI-MSI-edge      isv2-tx-0
>>    105:          1         0          0          0          0
>> 0          0          0   PCI-MSI-edge      isv2:mbx
>>
>> "isv" is netdevice name of my ixgbevf.
>>
>>
> Given load is very small, irqbalance chose to send interrupts on a
> single cpu.

This is CPU load measured by "top" and my cores are 8.


   PID USER      PR  NI  VIRT  RES  SHR    S     %CPU      %MEM     
TIME+      COMMAND
     3 root         20   0     0     0    0        R       99           
0.0     70:05.48    ksoftirqd/0

^ permalink raw reply

* Re: [PATCH 3/5] SUNRPC: make RPC service dependable on rpcbind clients creation
From: Stanislav Kinsbursky @ 2011-08-25 10:18 UTC (permalink / raw)
  To: Trond.Myklebust@netapp.com
  Cc: linux-nfs@vger.kernel.org, Pavel Emelianov, neilb@suse.de,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	bfields@fieldses.org, davem@davemloft.net
In-Reply-To: <20110824183359.4924.94364.stgit@localhost6.localdomain6>

This patch has a flaw: rpcbind clients have to be put in case of error in __svc_create().
So will be the second version.

24.08.2011 22:33, Stanislav Kinsbursky пишет:
> We create or increase users counter of rcbind clients during RPC service
> creation and decrease this counter (and possibly destroy those clients) on RPC
> service destruction.
>
> Signed-off-by: Stanislav Kinsbursky<skinsbursky@parallels.com>
>
> ---
>   include/linux/sunrpc/clnt.h |    2 ++
>   net/sunrpc/rpcb_clnt.c      |    2 +-
>   net/sunrpc/svc.c            |    5 +++++
>   3 files changed, 8 insertions(+), 1 deletions(-)
>
> diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h
> index db7bcaf..65a8115 100644
> --- a/include/linux/sunrpc/clnt.h
> +++ b/include/linux/sunrpc/clnt.h
> @@ -135,10 +135,12 @@ void		rpc_shutdown_client(struct rpc_clnt *);
>   void		rpc_release_client(struct rpc_clnt *);
>   void		rpc_task_release_client(struct rpc_task *);
>
> +int		rpcb_create_local(void);
>   int		rpcb_register(u32, u32, int, unsigned short);
>   int		rpcb_v4_register(const u32 program, const u32 version,
>   				 const struct sockaddr *address,
>   				 const char *netid);
> +void		rpcb_put_local(void);
>   void		rpcb_getport_async(struct rpc_task *);
>
>   void		rpc_call_start(struct rpc_task *);
> diff --git a/net/sunrpc/rpcb_clnt.c b/net/sunrpc/rpcb_clnt.c
> index b4cc0f1..437ec60 100644
> --- a/net/sunrpc/rpcb_clnt.c
> +++ b/net/sunrpc/rpcb_clnt.c
> @@ -318,7 +318,7 @@ out:
>    * Returns zero on success, otherwise a negative errno value
>    * is returned.
>    */
> -static int rpcb_create_local(void)
> +int rpcb_create_local(void)
>   {
>   	static DEFINE_MUTEX(rpcb_create_local_mutex);
>   	int result = 0;
> diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
> index 6a69a11..0df8532 100644
> --- a/net/sunrpc/svc.c
> +++ b/net/sunrpc/svc.c
> @@ -367,6 +367,9 @@ __svc_create(struct svc_program *prog, unsigned int bufsize, int npools,
>   	unsigned int xdrsize;
>   	unsigned int i;
>
> +	if (rpcb_create_local()<  0)
> +		return NULL;
> +
>   	if (!(serv = kzalloc(sizeof(*serv), GFP_KERNEL)))
>   		return NULL;
>   	serv->sv_name      = prog->pg_name;
> @@ -491,6 +494,8 @@ svc_destroy(struct svc_serv *serv)
>   	svc_unregister(serv);
>   	kfree(serv->sv_pools);
>   	kfree(serv);
> +
> +	rpcb_put_local();
>   }
>   EXPORT_SYMBOL_GPL(svc_destroy);
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
Best regards,
Stanislav Kinsbursky

^ permalink raw reply

* Re: [PATCH] tcp: bound RTO to minimum
From: Arnd Hannemann @ 2011-08-25 10:15 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Alexander Zimmermann, Yuchung Cheng, Hagen Paul Pfeifer, netdev,
	Lukowski Damian
In-Reply-To: <1314266562.2387.35.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>

Hi Eric,

Am 25.08.2011 12:02, schrieb Eric Dumazet:
> Le jeudi 25 août 2011 à 11:46 +0200, Arnd Hannemann a écrit :
>> Hi Eric,
>>
>> Am 25.08.2011 11:09, schrieb Eric Dumazet:
> 
>>> Maybe we should refine the thing a bit, to not reverse backoff unless
>>> rto is > some_threshold.
>>>
>>> Say 10s being the value, that would give at most 92 tries.
>>
>> I personally think that 10s would be too large and eliminate the benefit of the
>> algorithm, so I would prefer a different solution.
>>
>> In case of one bulk data TCP session, which was transmitting hundreds of packets/s
>> before the connectivity disruption those worst case rate of 5 packet/s really
>> seems conservative enough.
>>
>> However in case of a lot of idle connections, which were transmitting only
>> a number of packets per minute. We might increase the rate drastically for
>> a certain period until it throttles down. You say that we have a problem here
>> correct?
>>
>> Do you think it would be possible without much hassle to use a kind of "global"
>> rate limiting only for these probe packets of a TCP connection?
>>
>>> I mean, what is the gain to be able to restart a frozen TCP session with
>>> a 1sec latency instead of 10s if it was blocked more than 60 seconds ?
>>
>> I'm afraid it does a lot, especially in highly dynamic environments. You
>> don't have just the additional latency, you may actually miss the full
>> period where connectivity was there, and then just retransmit into the next
>> connectivity disrupted period.
> 
> Problem with this is that with short and synchronized timers, all
> sessions will flood at the same time and you'll get congestion this
> time.

Why do you think the timers are "syncronized"? If you have congestion
then you will do exponential backoff.

> The reason for exponential backoff is also to smooth the restarts of
> sessions, because timers are randomized.

If the RTO of these sessions were "randomized" they keep this randomization,
even if backoffs are reverted, at least they should.

Best regards
Arnd

^ permalink raw reply

* Re: [PATCH] tcp: bound RTO to minimum
From: Ilpo Järvinen @ 2011-08-25 10:14 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Arnd Hannemann, Alexander Zimmermann, Yuchung Cheng,
	Hagen Paul Pfeifer, netdev, Lukowski Damian
In-Reply-To: <1314266562.2387.35.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1966 bytes --]

On Thu, 25 Aug 2011, Eric Dumazet wrote:

> Le jeudi 25 août 2011 à 11:46 +0200, Arnd Hannemann a écrit :
> > Hi Eric,
> > 
> > Am 25.08.2011 11:09, schrieb Eric Dumazet:
> 
> > > Maybe we should refine the thing a bit, to not reverse backoff unless
> > > rto is > some_threshold.
> > > 
> > > Say 10s being the value, that would give at most 92 tries.
> > 
> > I personally think that 10s would be too large and eliminate the benefit of the
> > algorithm, so I would prefer a different solution.
> > 
> > In case of one bulk data TCP session, which was transmitting hundreds of packets/s
> > before the connectivity disruption those worst case rate of 5 packet/s really
> > seems conservative enough.
> > 
> > However in case of a lot of idle connections, which were transmitting only
> > a number of packets per minute. We might increase the rate drastically for
> > a certain period until it throttles down. You say that we have a problem here
> > correct?
> > 
> > Do you think it would be possible without much hassle to use a kind of 
> > "global" rate limiting only for these probe packets of a TCP connection?
> >
> > > I mean, what is the gain to be able to restart a frozen TCP session with
> > > a 1sec latency instead of 10s if it was blocked more than 60 seconds ?
> > 
> > I'm afraid it does a lot, especially in highly dynamic environments. You
> > don't have just the additional latency, you may actually miss the full
> > period where connectivity was there, and then just retransmit into the next
> > connectivity disrupted period.
> 
> Problem with this is that with short and synchronized timers, all
> sessions will flood at the same time and you'll get congestion this
> time.
>
> The reason for exponential backoff is also to smooth the restarts of
> sessions, because timers are randomized.

But if you get a real congestion the system will self-regulate using 
exponential backoffs due to lack of ICMPs for some of the connections?


-- 
 i.

^ permalink raw reply

* Re: [BUG] tcp : how many times a frame can possibly be retransmitted ?
From: Ilpo Järvinen @ 2011-08-25 10:07 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Jerry Chu, Damian Lukowski
In-Reply-To: <1314265254.2387.31.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1489 bytes --]

On Thu, 25 Aug 2011, Eric Dumazet wrote:

> Le jeudi 25 août 2011 à 11:56 +0300, Ilpo Järvinen a écrit :
> 
> > So you think that this is not true: ?
> > 
> >         /* NOTE: clamping at TCP_RTO_MIN is not required, current algo
> >          * guarantees that rto is higher.
> >          */
> > 
> > ...it would still be smaller than 1sec though, but certainly not going to 
> > cause flooding either. Default tcp_rto_min should be 200ms so it's 
> > 5pkts+5ICMP sent, received and processed per second. Which doesn't sound 
> > that bad CPU load?!?
> > 
> 
> Unless you have 100.000 active sessions maybe ?
> 
> Some years ago, I helped people running servers with more than 1.000.000
> long living active sessions, and a temporary network disruption was
> already very critical at that time, with old kernels (At that time, IP
> route cache could blow away and consume too much ram or cpu time, things
> are now under control)
> 
> I guess they would not try a new kernel :(
> 
> > It is unclear to me how tp->rttvar could become smaller than 
> > tcp_rto_min().
> 
> I believe this part is fine Ilpo.
> 
> As long as we handle few tcp sessions, its fine to send 5 messages per
> session per second.

Yeah, thanks for the clarification. I was just confused by the initial 
wording of yours which seemed to imply that we could, at worst, end up 
doing it with full rate without any timers.

To me it seems that both cases are quite valid, with pretty much 
contradicting goals.

-- 
 i.

^ permalink raw reply

* Re: [PATCH] tcp: bound RTO to minimum
From: Eric Dumazet @ 2011-08-25 10:02 UTC (permalink / raw)
  To: Arnd Hannemann
  Cc: Alexander Zimmermann, Yuchung Cheng, Hagen Paul Pfeifer, netdev,
	Lukowski Damian
In-Reply-To: <4E5619DA.6070902@arndnet.de>

Le jeudi 25 août 2011 à 11:46 +0200, Arnd Hannemann a écrit :
> Hi Eric,
> 
> Am 25.08.2011 11:09, schrieb Eric Dumazet:

> > Maybe we should refine the thing a bit, to not reverse backoff unless
> > rto is > some_threshold.
> > 
> > Say 10s being the value, that would give at most 92 tries.
> 
> I personally think that 10s would be too large and eliminate the benefit of the
> algorithm, so I would prefer a different solution.
> 
> In case of one bulk data TCP session, which was transmitting hundreds of packets/s
> before the connectivity disruption those worst case rate of 5 packet/s really
> seems conservative enough.
> 
> However in case of a lot of idle connections, which were transmitting only
> a number of packets per minute. We might increase the rate drastically for
> a certain period until it throttles down. You say that we have a problem here
> correct?
> 
> Do you think it would be possible without much hassle to use a kind of "global"
> rate limiting only for these probe packets of a TCP connection?
> 
> > I mean, what is the gain to be able to restart a frozen TCP session with
> > a 1sec latency instead of 10s if it was blocked more than 60 seconds ?
> 
> I'm afraid it does a lot, especially in highly dynamic environments. You
> don't have just the additional latency, you may actually miss the full
> period where connectivity was there, and then just retransmit into the next
> connectivity disrupted period.

Problem with this is that with short and synchronized timers, all
sessions will flood at the same time and you'll get congestion this
time.

The reason for exponential backoff is also to smooth the restarts of
sessions, because timers are randomized.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox