Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 2/9] IB: amso1100: convert to SKB paged frag API.
From: Steve Wise @ 2011-08-25 14:42 UTC (permalink / raw)
  To: Ian Campbell
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, Tom Tucker, Roland Dreier,
	Sean Hefty, Hal Rosenstock, linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1314260895-15936-2-git-send-email-ian.campbell-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>

Acked-by: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [RFC] per-containers tcp buffer limitation
From: Chris Friesen @ 2011-08-25 15:05 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: Eric W. Biederman, KAMEZAWA Hiroyuki, Glauber Costa,
	Linux Containers, netdev, David Miller, Pavel Emelyanov
In-Reply-To: <4E56464B.4070304@monom.org>

On 08/25/2011 06:55 AM, Daniel Wagner wrote:

> I'd like to solve a use case where it is necessary to count all bytes
> transmitted and received by an application [1]. So far I have found two
> unsatisfying solution for it. The first one is to hook into libc and
> count the bytes there. I don't think I have to say I don't like this.

Is there any particular reason you can't use LD_PRELOAD to interpose a 
library to do the statistics monitoring?

Chris

-- 
Chris Friesen
Software Developer
GENBAND
chris.friesen@genband.com
www.genband.com

^ permalink raw reply

* Re: [RFC PATCH v2 0/9] bql: Byte Queue Limits
From: Tom Herbert @ 2011-08-25 15:19 UTC (permalink / raw)
  To: jhs; +Cc: davem, netdev, Johannes Berg
In-Reply-To: <1312808784.17202.39.camel@mojatatu>

> For wired connections I think the big deal is in improved
> runtime memory saving (your perf numbers are kinda ok).
> The challenge is going to be with wireless where the underlying
> bandwidth changes (and therefore the optimal queue size varies
> more frequently). The problem with active queue management is
> getting the feedback loop to be more accurate and i think there
> will be challenges with wired devices.

The important characteristic (for us at least) will be reduced latency
for high priority packets (for NICs that don't support qos multiQ.  I
do have data showing those benefits, but it's a little old.  I will
have something to present at LPC.

> I notice that you dont have any wireless devices;
> but it would be nice for someone to check this out on wireless.
> CCing Johannes - maybe he has some insight.
>
Yeah, these scare me ;-)

> cheers,
> jamal
>
>
>
>

^ permalink raw reply

* [PATCH] cassini: init before use in cas_interruptN.
From: Francois Romieu @ 2011-08-25 15:02 UTC (permalink / raw)
  To: Thomas Jarosch; +Cc: netdev, David S. Miller
In-Reply-To: <201108251558.45290.thomas.jarosch@intra2net.com>

Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Spotted-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
---

 David, any opinion regarding the removal of the USE_NAPI #ifdef
 in this driver ?

 drivers/net/cassini.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/drivers/net/cassini.c b/drivers/net/cassini.c
index 646c86b..fdb7a17 100644
--- a/drivers/net/cassini.c
+++ b/drivers/net/cassini.c
@@ -2452,14 +2452,13 @@ static irqreturn_t cas_interruptN(int irq, void *dev_id)
 	struct net_device *dev = dev_id;
 	struct cas *cp = netdev_priv(dev);
 	unsigned long flags;
-	int ring;
+	int ring = (irq == cp->pci_irq_INTC) ? 2 : 3;
 	u32 status = readl(cp->regs + REG_PLUS_INTRN_STATUS(ring));
 
 	/* check for shared irq */
 	if (status == 0)
 		return IRQ_NONE;
 
-	ring = (irq == cp->pci_irq_INTC) ? 2 : 3;
 	spin_lock_irqsave(&cp->lock, flags);
 	if (status & INTR_RX_DONE_ALT) { /* handle rx separately */
 #ifdef USE_NAPI
-- 
1.7.4.4

^ permalink raw reply related

* Re: [RFC PATCH v2 0/9] bql: Byte Queue Limits
From: Tom Herbert @ 2011-08-25 15:29 UTC (permalink / raw)
  To: Johannes Berg; +Cc: jhs, davem, netdev
In-Reply-To: <1312809524.4372.29.camel@jlt3.sipsolutions.net>

> Well, the wireless case is curious, and has a whole bunch of corner
> cases, since it's not necessarily PtP, it can be PtMP!
>
> But considering the most basic case of us being a client connecting to
> an AP first: yes, the bandwidth will change dynamically, I don't know
> what impact this has on BQL, Tom, maybe you can think about this a bit?
>
BQL is dynamic, and will increase the queue limit more aggressively
than decrease it.  So for instance, we can track the largest queue
needed over 30 seconds which should be stable in the presence even in
the presence of fluctuating bandwidth.  The thing that worries me is
rather the HW queues conform to the queue characteristics described in
the patch.  If transmit completions are random and not regular, BQL
probably can't function well.

If you'd like to bring this up on some wireless devices that would be
great, I don't have easy access to any right now, but I can try to
help otherwise.


> The second big challenge in wireless is the PtMP case: if we're acting
> as an AP, then we typically have four queues for any number of remote
> endpoints with varying bandwidth. I haven't found a good way to handle
> this, we can't have hardware queues per station (most HW is simply not
> capable of that many queues) but technically we would want to make the
> queue limits depend on the peer...
>
> Since I just returned from vacation I have tons of email to dig through
> I'll have to keep this short for now, but I'm definitely interested.
>
> johannes
>
>

^ permalink raw reply

* RE: [PATCH 3/9] IB: nes: convert to SKB paged frag API.
From: Latif, Faisal @ 2011-08-25 15:33 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Roland Dreier, Hefty, Sean, Hal Rosenstock,
	linux-rdma@vger.kernel.org, netdev@vger.kernel.org
In-Reply-To: <1314260895-15936-3-git-send-email-ian.campbell@citrix.com>



Acked-by: Faisal Latif <faisal.latif@intel.com>

Thanks.

> ---
>  drivers/infiniband/hw/nes/nes_nic.c |   21 +++++++++++----------
>  1 files changed, 11 insertions(+), 10 deletions(-)

^ permalink raw reply

* Re: linux-next: build failure after merge of the staging tree
From: Greg KH @ 2011-08-25 15:39 UTC (permalink / raw)
  To: Larry Finger
  Cc: Stephen Rothwell, linux-next, linux-kernel, wlanfae, Jiri Pirko,
	David Miller, netdev
In-Reply-To: <4E55DA8C.10102@lwfinger.net>

On Thu, Aug 25, 2011 at 12:15:56AM -0500, Larry Finger wrote:
> On 08/25/2011 12:02 AM, Stephen Rothwell wrote:
> >Hi Greg,
> >
> >After merging the staging tree, today's linux-next build (x86_64
> >allmodconfig) failed like this:
> >
> >drivers/staging/rtl8192e/rtl_core.c:2917:2: error: unknown field 'ndo_set_multicast_list' specified in initializer
> >
> >Caused by commit 94a799425eee ("From: wlanfae<wlanfae@realtek.com>" -
> >really "[PATCH 1/8] rtl8192e: Import new version of driver from realtek"
> >Larry, that patch was badly imported ...) interacting with commit
> >b81693d9149c ("net: remove ndo_set_multicast_list callback") from the net
> >tree.
> >
> >I applied the following patch (which seems to be what was done to the
> >other drivers in the net tree - there is probably more required):
> >
> >From: Stephen Rothwell<sfr@canb.auug.org.au>
> >Date: Thu, 25 Aug 2011 14:57:55 +1000
> >Subject: [PATCH] rtl8192e: update for ndo_set_multicast_list removal.
> >
> >Signed-off-by: Stephen Rothwell<sfr@canb.auug.org.au>
> >---
> >  drivers/staging/rtl8192e/rtl_core.c |    2 +-
> >  1 files changed, 1 insertions(+), 1 deletions(-)
> >
> >diff --git a/drivers/staging/rtl8192e/rtl_core.c b/drivers/staging/rtl8192e/rtl_core.c
> >index f8a13d9..b38f626 100644
> >--- a/drivers/staging/rtl8192e/rtl_core.c
> >+++ b/drivers/staging/rtl8192e/rtl_core.c
> >@@ -2914,7 +2914,7 @@ static const struct net_device_ops rtl8192_netdev_ops = {
> >  	.ndo_stop = rtl8192_close,
> >  	.ndo_tx_timeout = rtl8192_tx_timeout,
> >  	.ndo_do_ioctl = rtl8192_ioctl,
> >-	.ndo_set_multicast_list = r8192_set_multicast,
> >+	.ndo_set_rx_mode = r8192_set_multicast,
> >  	.ndo_set_mac_address = r8192_set_mac_adr,
> >  	.ndo_validate_addr = eth_validate_addr,
> >  	.ndo_change_mtu = eth_change_mtu,
> 
> Stephan,
> 
> Thanks for the notice. It seems that commit b81693d9149c had not
> made it into my copy of staging. I'll look into the issue.

It wouldn't ever make it there, as that's coming from the net-next tree,
so this will have to wait until stuff merges together in Linus's tree.

thanks,

greg k-h

^ permalink raw reply

* Re: [RFC] per-containers tcp buffer limitation
From: Stephen Hemminger @ 2011-08-25 15:44 UTC (permalink / raw)
  To: Chris Friesen
  Cc: Daniel Wagner, Eric W. Biederman, KAMEZAWA Hiroyuki,
	Glauber Costa, Linux Containers, netdev, David Miller,
	Pavel Emelyanov
In-Reply-To: <4E5664B5.6000806@genband.com>

You seem to have forgotten the work of your forefathers. When appealing
to history you must understand it first.

What about using netfilter (with extensions)? We already have iptables
module to match on uid or gid. It wouldn't be hard to extend this to
other bits of meta data like originating and target containers.

You could also use this to restrict access to ports and hosts on
a per container basis.

^ permalink raw reply

* [PATCH net-next] net_sched: sfb: optimize enqueue on full queue
From: Eric Dumazet @ 2011-08-25 16:21 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

In case SFB queue is full (hard limit reached), there is no point
spending time to compute hash and maximum qlen/p_mark.

We instead just early drop packet.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/sched/sch_sfb.c |   13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/net/sched/sch_sfb.c b/net/sched/sch_sfb.c
index 0a833d0..e83c272 100644
--- a/net/sched/sch_sfb.c
+++ b/net/sched/sch_sfb.c
@@ -287,6 +287,12 @@ static int sfb_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	u32 r, slot, salt, sfbhash;
 	int ret = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
 
+	if (unlikely(sch->q.qlen >= q->limit)) {
+		sch->qstats.overlimits++;
+		q->stats.queuedrop++;
+		goto drop;
+	}
+
 	if (q->rehash_interval > 0) {
 		unsigned long limit = q->rehash_time + q->rehash_interval;
 
@@ -332,12 +338,9 @@ static int sfb_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	slot ^= 1;
 	sfb_skb_cb(skb)->hashes[slot] = 0;
 
-	if (unlikely(minqlen >= q->max || sch->q.qlen >= q->limit)) {
+	if (unlikely(minqlen >= q->max)) {
 		sch->qstats.overlimits++;
-		if (minqlen >= q->max)
-			q->stats.bucketdrop++;
-		else
-			q->stats.queuedrop++;
+		q->stats.bucketdrop++;
 		goto drop;
 	}
 

^ permalink raw reply related

* Traffic shaping - class ID 16bit limit?
From: Miroslav Kratochvil @ 2011-08-25 16:28 UTC (permalink / raw)
  To: netdev

Hello everyone,

the question is simple: What should I do if I need to have more than
2^16 subclasses of a classful queuing discipline (in, say, hfsc or
htb)?

I bumped into this problem while writing some kind of traffic shaping
software and thinking about scalability. As there still are other ways
to have more than 64k "classes" (like grouping some subclasses into
separate qdiscs), those ways have significant drawbacks (require more
tc-filter rules and decisions, generally more processing power, and
the structure is quite hard to maintain).

Technically the ClassID seems to be "hardcoded" as a 16bit value, but
after some source searching, I haven't found any good reason for it to
be 16-bit only.

I understand that those ID's are usually handled together with another
16bit Qdisc ID, which would add up to a quite big number (possibly
unpleasant on some architectures) if those were both 32bit.

I also completely understand that in most cases of common usage
there's absolutely no need to have this big amount of subclasses, but
on the other hand there's still no reason to have "64k classes enough
for everyone". :D

Of course if there's some obvious method to solve this, or a patch, or
some kind of workaround that I haven't found, please let me know about
it, I will happily use it.

Thanks for any suggestions,
Mirek Kratochvil

^ permalink raw reply

* Re: Traffic shaping - class ID 16bit limit?
From: Stephen Hemminger @ 2011-08-25 16:39 UTC (permalink / raw)
  To: Miroslav Kratochvil; +Cc: netdev
In-Reply-To: <CAO0uZ+-fv89Z3-9+vh5kN93xe=Uw8b=PSfnAqosOjUBP6PcVNg@mail.gmail.com>

On Thu, 25 Aug 2011 18:28:01 +0200
Miroslav Kratochvil <exa.exa@gmail.com> wrote:

> Technically the ClassID seems to be "hardcoded" as a 16bit value, but
> after some source searching, I haven't found any good reason for it to
> be 16-bit only.

Granted it was a poor choice in the initial design.
It is wired into the API and changing it would be quite painful.

You might be able to do the same thing by splitting traffic
into multiple virtual devices (dummy or ifb) and then doing
another layer.

^ permalink raw reply

* Re: Traffic shaping - class ID 16bit limit?
From: Miroslav Kratochvil @ 2011-08-25 17:06 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20110825093937.2a8a1457@nehalam.ftrdhcpuser.net>

>> Technically the ClassID seems to be "hardcoded" as a 16bit value, but
>> after some source searching, I haven't found any good reason for it to
>> be 16-bit only.
>
> Granted it was a poor choice in the initial design.
> It is wired into the API and changing it would be quite painful.
>

I was feeling something like that would come.

If I get it correctly, the API change would consist of:

- some netlink protocol change
- slight modification of qdisc_class_hash
- modifications in all (four?) hierarchical schedulers
- tiny expansion of userspace tc utility

which isn't that painful (except for the CBQ part), but I'm probably
missing something, and presumably the change would take some time to
get mainstream -- probably way more time than writing a hfsc clone
that is controlled using some other interface than tc/netlink. :(

(but hey! I have a topic for school work!)

> You might be able to do the same thing by splitting traffic
> into multiple virtual devices (dummy or ifb) and then doing
> another layer.
>

My scenario looks pretty simple, mostly like a big hashing filter
attached at the device root, flowid'ing the stuff to leaf classes.
Could you please provide some simple illustration of splitting that
into multiple devices? I guess that the main problem with this
approach would be that my subclasses usually don't share anything in
common, especially not any pretty IP prefixes that would allow good
splitting.

Anyway, thanks very much for response!

-mk

^ permalink raw reply

* Re: Traffic shaping - class ID 16bit limit?
From: Stephen Hemminger @ 2011-08-25 17:10 UTC (permalink / raw)
  To: Miroslav Kratochvil; +Cc: netdev
In-Reply-To: <CAO0uZ+_6xC0gymfbu28PRK4SaVgkGaSbbe-PgXvZ4h-cPp8k2A@mail.gmail.com>

On Thu, 25 Aug 2011 19:06:58 +0200
Miroslav Kratochvil <exa.exa@gmail.com> wrote:

> >> Technically the ClassID seems to be "hardcoded" as a 16bit value, but
> >> after some source searching, I haven't found any good reason for it to
> >> be 16-bit only.
> >
> > Granted it was a poor choice in the initial design.
> > It is wired into the API and changing it would be quite painful.
> >
> 
> I was feeling something like that would come.
> 
> If I get it correctly, the API change would consist of:
> 
> - some netlink protocol change
> - slight modification of qdisc_class_hash
> - modifications in all (four?) hierarchical schedulers
> - tiny expansion of userspace tc utility

And all the magic compatiablity layers to make old code
work with new code.

^ permalink raw reply

* Re: [RFC] per-containers tcp buffer limitation
From: Glauber Costa @ 2011-08-25 18:02 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: KAMEZAWA Hiroyuki, Linux Containers, netdev, David Miller,
	Pavel Emelyanov
In-Reply-To: <m14o16qlq1.fsf@fess.ebiederm.org>

On 08/24/2011 11:16 PM, Eric W. Biederman wrote:
> KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>  writes:
>
>> On Wed, 24 Aug 2011 22:28:59 -0300
>> Glauber Costa<glommer@parallels.com>  wrote:
>>
>>> On 08/24/2011 09:35 PM, Eric W. Biederman wrote:
>>>> Glauber Costa<glommer@parallels.com>   writes:
>>> Hi Eric,
>>>
>>> Thanks for your attention.
>>>
>>> So, this that you propose was my first implementation. I ended up
>>> throwing it away after playing with it for a while.
>>>
>>> One of the first problems that arise from that, is that the sysctls are
>>> a tunable visible from inside the container. Those limits, however, are
>>> to be set from the outside world. The code is not much better than that
>>> either, and instead of creating new cgroup structures and linking them
>>> to the protocol, we end up doing it for net ns. We end up increasing
>>> structures just the same...
>
> You don't need to add a netns member to sockets.
But then you have to grow the netns structure itself somehow.
>
> But I do agree that there are odd permission issues with using the
> existing sysctls and making them per namespace.
>
> However almost everything I have seen with memory limits I have found
> very strange.  They all seem like a very bad version of disabling memory
> over commits.

More or less. At least from our perspective, the only thing we're really 
interested in capping are non-swappable resources. So you could not 
overcommit anyway.

For the sockets/tcp case, it is an even easier case. The code as it is 
today already allow you to define soft and hard memory limits: I am just 
making it container-wide, instead of system-wide.

>>> Also, since we're doing resource control, it seems more natural to use
>>> cgroups. Now, the fact that there are no correlation whatsoever between
>>> cgroups and namespaces does bother me. But that's another story, much
>>> more broader and general than this patch.
>>>
>>
>> I think using cgroup makes sense. A question in mind is whehter it is
>> better to integrate this kind of 'memory usage' controls to memcg or
>> not.
>
> Maybe.  When sockets start getting a cgroup member I start wondering,
> how many cgroup members will sockets potentially belong to.
>
>> How do you think ? IMHO, having cgroup per class of object is messy.
>> ...
>> How about adding
>> 	memory.tcp_mem
>> to memcg ?
>>
>> Or, adding kmem cgroup ?
>>
>>> About overhead, since this is the first RFC, I did not care about
>>> measuring. However, it seems trivial to me to guarantee that at least
>>> that it won't impose a significant performance penalty when it is
>>> compiled out. If we're moving forward with this implementation, I will
>>> include data in the next release so we can discuss in this basis.
>>>
>>
>> IMHO, you should show performance number even if RFC. Then, people will
>> see patch with more interests.
>
> And also compiled out doesn't really count.  Cgroups are something you
> want people to compile into distributions for the common case, and you
> don't want to impose a noticeable performance penalty for the common
> case.
Absolutely agreed.

^ permalink raw reply

* Re: [RFC] per-containers tcp buffer limitation
From: Glauber Costa @ 2011-08-25 18:05 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Eric W. Biederman, Linux Containers, netdev, David Miller,
	Pavel Emelyanov
In-Reply-To: <20110825104956.41c4b60e.kamezawa.hiroyu@jp.fujitsu.com>

On 08/24/2011 10:49 PM, KAMEZAWA Hiroyuki wrote:
> On Wed, 24 Aug 2011 22:28:59 -0300
> Glauber Costa<glommer@parallels.com>  wrote:
>
>> On 08/24/2011 09:35 PM, Eric W. Biederman wrote:
>>> Glauber Costa<glommer@parallels.com>   writes:
>>>
>>>> Hello,
>>>>
>>>> This is a proof of concept of some code I have here to limit tcp send and
>>>> receive buffers per-container (in our case). At this phase, I am more concerned
>>>> in discussing my approach, so please curse my family no further than the 3rd
>>>> generation.
>>>>
>>>> The problem we're trying to attack here, is that buffers can grow and fill
>>>> non-reclaimable kernel memory. When doing containers, we can't afford having a
>>>> malicious container pinning kernel memory at will, therefore exhausting all the
>>>> others.
>>>>
>>>> So here a container will be seen in the host system as a group of tasks, grouped
>>>> in a cgroup. This cgroup will have files allowing us to specify global
>>>> per-cgroup limits on buffers. For that purpose, I created a new sockets cgroup -
>>>> didn't really think any other one of the existing would do here.
>>>>
>>>> As for the network code per-se, I tried to keep the same code that deals with
>>>> memory schedule as a basis and make it per-cgroup.
>>>> You will notice that struct proto now take function pointers to values
>>>> controlling memory pressure and will return per-cgroup data instead of global
>>>> ones. So the current behavior is maintained: after the first threshold is hit,
>>>> we enter memory pressure. After that, allocations are suppressed.
>>>>
>>>> Only tcp code was really touched here. udp had the pointers filled, but we're
>>>> not really controlling anything. But the fact that this lives in generic code,
>>>> makes it easier to do the same for other protocols in the future.
>>>>
>>>> For this patch specifically, I am not touching - just provisioning -
>>>> rmem and wmem specific knobs. I should also #ifdef a lot of this, but hey,
>>>> remember: rfc...
>>>>
>>>> One drawback of this approach I found, is that cgroups does not really work well
>>>> with modules. A lot of the network code is modularized, so this would have to be
>>>> fixed somehow.
>>>>
>>>> Let me know what you think.
>>>
>>> Can you implement this by making the existing network sysctls per
>>> network namespace?
>>>
>>> At a quick skim it looks to me like you can make the existing sysctls
>>> per network namespace and solve the issues you are aiming at solving and
>>> that should make the code much simpler, than your proof of concept code.
>>>
>>> Any implementation of this needs to answer the question how much
>>> overhead does this extra accounting add.  I don't have a clue how much
>>> overhead you are adding but you are making structures larger and I
>>> suspect adding at least another cache line miss, so I suspect your
>>> changes will impact real world socket performance.
>>
>> Hi Eric,
>>
>> Thanks for your attention.
>>
>> So, this that you propose was my first implementation. I ended up
>> throwing it away after playing with it for a while.
>>
>> One of the first problems that arise from that, is that the sysctls are
>> a tunable visible from inside the container. Those limits, however, are
>> to be set from the outside world. The code is not much better than that
>> either, and instead of creating new cgroup structures and linking them
>> to the protocol, we end up doing it for net ns. We end up increasing
>> structures just the same...
>>
>> Also, since we're doing resource control, it seems more natural to use
>> cgroups. Now, the fact that there are no correlation whatsoever between
>> cgroups and namespaces does bother me. But that's another story, much
>> more broader and general than this patch.
>>
>
> I think using cgroup makes sense. A question in mind is whehter it is
> better to integrate this kind of 'memory usage' controls to memcg or not.
>
> How do you think ? IMHO, having cgroup per class of object is messy.
> ...
> How about adding
> 	memory.tcp_mem
> to memcg ?
>
> Or, adding kmem cgroup ?

I don't really care which cgroup do we use. I choosed a new socket one,
because they are usually not like other objects. People love tweaking 
network aspects, and it is not hard to imagine people wanting to extend it.

Now, if all of this will ever belong to cgroup, is of course a different 
matter.

Between your two suggestions, I like kmem better. It makes it then 
absolutely clear that we will handle kernel objects only...

>> About overhead, since this is the first RFC, I did not care about
>> measuring. However, it seems trivial to me to guarantee that at least
>> that it won't impose a significant performance penalty when it is
>> compiled out. If we're moving forward with this implementation, I will
>> include data in the next release so we can discuss in this basis.
>>
>
> IMHO, you should show performance number even if RFC. Then, people will
> see patch with more interests.

Let's call this one pre-RFC then.

^ permalink raw reply

* Re: [RFC] per-containers tcp buffer limitation
From: Glauber Costa @ 2011-08-25 18:11 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Chris Friesen, Daniel Wagner, Eric W. Biederman,
	KAMEZAWA Hiroyuki, Linux Containers, netdev, David Miller,
	Pavel Emelyanov
In-Reply-To: <20110825084415.3c3094e8@nehalam.ftrdhcpuser.net>

On 08/25/2011 12:44 PM, Stephen Hemminger wrote:
> You seem to have forgotten the work of your forefathers. When appealing
> to history you must understand it first.
>
> What about using netfilter (with extensions)? We already have iptables
> module to match on uid or gid. It wouldn't be hard to extend this to
> other bits of meta data like originating and target containers.
>
> You could also use this to restrict access to ports and hosts on
> a per container basis.
>

Hello Stephen,

I am pretty sure netfilter can provide us with amazing functionality 
that will help our containers implementation a lot.

I don't think, however, that memory limitation belongs in there. First 
of all, IIRC, we are not dropping packets, re-routing, dealing with any
low level characteristic, etc. We're just controlling buffer size. This 
seems orthogonal to the work of netfilter.

Think for instance, in the soft limit: When we hit it, we enter a memory 
pressure scenario. How would netfilter handle that?

So I guess cgroup is still better suited for this very specific task we 
have in mind here. For most of the others, I have no doubt that 
netfilter would come handy.

Thanks for your time!

^ permalink raw reply

* Re: [RFC] per-containers tcp buffer limitation
From: Daniel Wagner @ 2011-08-25 18:27 UTC (permalink / raw)
  To: Chris Friesen
  Cc: Eric W. Biederman, KAMEZAWA Hiroyuki, Glauber Costa,
	Linux Containers, netdev, David Miller, Pavel Emelyanov
In-Reply-To: <4E5664B5.6000806@genband.com>

Hi Chris,

On 08/25/2011 05:05 PM, Chris Friesen wrote:
> On 08/25/2011 06:55 AM, Daniel Wagner wrote:
> 
>> I'd like to solve a use case where it is necessary to count all bytes
>> transmitted and received by an application [1]. So far I have found two
>> unsatisfying solution for it. The first one is to hook into libc and
>> count the bytes there. I don't think I have to say I don't like this.
> 
> Is there any particular reason you can't use LD_PRELOAD to interpose a
> library to do the statistics monitoring?

This is certainly possible to do for any dynamic linked application. I
think it wouldn't work for static linked ones. Currently I don't know if
I have to face such application on the project I am on. The reason why I
am not so a big fan is that using the LD_PRELOAD trick seems very
hackish to me.

As Glauber has argumented in this thread, there are properly quite a few
people who want to control or monitor sockets. It seems I am one of
those. Having this kind of support in cgroups seems like a very neat
solution to me.

thanks,
daniel

^ permalink raw reply

* Re: [RFC] per-containers tcp buffer limitation
From: Daniel Wagner @ 2011-08-25 18:33 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Chris Friesen, Eric W. Biederman, KAMEZAWA Hiroyuki,
	Glauber Costa, Linux Containers, netdev, David Miller,
	Pavel Emelyanov
In-Reply-To: <20110825084415.3c3094e8@nehalam.ftrdhcpuser.net>

Hi Stephen,

On 08/25/2011 05:44 PM, Stephen Hemminger wrote:
> What about using netfilter (with extensions)? We already have iptables
> module to match on uid or gid. It wouldn't be hard to extend this to
> other bits of meta data like originating and target containers.

>From reading the man pages the "owner" extension of netfilter would only
allow to match on outgoing traffic. Would it be possible to extend this
to also match on incoming traffic? Sorry to be completely ignorant here.

thanks,
daniel

^ permalink raw reply

* Re: [RFC] per-containers tcp buffer limitation
From: Daniel Wagner @ 2011-08-25 18:45 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Chris Friesen, Eric W. Biederman, KAMEZAWA Hiroyuki,
	Glauber Costa, Linux Containers, netdev, David Miller,
	Pavel Emelyanov
In-Reply-To: <4E569571.1080603@monom.org>

Hi Stephen,

> On 08/25/2011 05:44 PM, Stephen Hemminger wrote:
>> What about using netfilter (with extensions)? We already have iptables
>> module to match on uid or gid. It wouldn't be hard to extend this to
>> other bits of meta data like originating and target containers.
> 
> From reading the man pages the "owner" extension of netfilter would only
> allow to match on outgoing traffic. Would it be possible to extend this
> to also match on incoming traffic? Sorry to be completely ignorant here.

I just realized, that the "owner" extension is "only" matching on
UID/GID. For thing I would like to solve the match should be on PID.

IIRC the "owner" extension supported but this feature but it was removed [1]

thanks,
daniel

[1]
http://www.mail-archive.com/git-commits-head@vger.kernel.org/msg00486.html

^ permalink raw reply

* Re: [PATCH 1/2] igb: Allow extra 4 bytes on RX for vlan tags.
From: Ben Greear @ 2011-08-25 18:51 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: jeffrey.t.kirsher, Jesse Gross, netdev@vger.kernel.org,
	Duyck, Alexander H
In-Reply-To: <CAKgT0UfaEEvRTSpu-U+0_oj0KnEkyx5hRAwZDiCAAdtY4YhQUQ@mail.gmail.com>

On 07/20/2011 11:35 PM, Alexander Duyck wrote:
> On Wed, Jul 20, 2011 at 6:21 PM, Jeff Kirsher
> <jeffrey.t.kirsher@intel.com>  wrote:
>> On Wed, 2011-07-20 at 17:27 -0700, Ben Greear wrote:
>>> On 07/20/2011 05:18 PM, Jesse Gross wrote:
>>>> On Thu, Feb 17, 2011 at 9:28 AM, Ben Greear<greearb@candelatech.com>    wrote:
>>>>> On 02/17/2011 03:04 AM, Jeff Kirsher wrote:
>>>>>>
>>>>>> On Thu, Feb 10, 2011 at 13:59,<greearb@candelatech.com>      wrote:
>>>>>>>
>>>>>>> From: Ben Greear<greearb@candelatech.com>
>>>>>>>
>>>>>>> This allows the NIC to receive 1518 byte (not counting
>>>>>>> FCS) packets when MTU is 1500, thus allowing 1500 MTU
>>>>>>> VLAN frames to be received.  Please note that no VLANs
>>>>>>> were actually configured on the NIC...it was just acting
>>>>>>> as pass-through device.
>>>>>>>
>>>>>>> Signed-off-by: Ben Greear<greearb@candelatech.com>
>>>>>>> ---
>>>>>>> :100644 100644 58c665b... 30c9cc6... M  drivers/net/igb/igb_main.c
>>>>>>>    drivers/net/igb/igb_main.c |    5 +++--
>>>>>>>    1 files changed, 3 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c
>>>>>>> index 58c665b..30c9cc6 100644
>>>>>>> --- a/drivers/net/igb/igb_main.c
>>>>>>> +++ b/drivers/net/igb/igb_main.c
>>>>>>> @@ -2281,7 +2281,8 @@ static int __devinit igb_sw_init(struct igb_adapter
>>>>>>> *adapter)
>>>>>>>          adapter->rx_itr_setting = IGB_DEFAULT_ITR;
>>>>>>>          adapter->tx_itr_setting = IGB_DEFAULT_ITR;
>>>>>>>
>>>>>>> -       adapter->max_frame_size = netdev->mtu + ETH_HLEN + ETH_FCS_LEN;
>>>>>>> +       adapter->max_frame_size = (netdev->mtu + ETH_HLEN + ETH_FCS_LEN
>>>>>>> +                                  + VLAN_HLEN);
>>>>>>>          adapter->min_frame_size = ETH_ZLEN + ETH_FCS_LEN;
>>>>>>>
>>>>>>>          spin_lock_init(&adapter->stats64_lock);
>>>>>>> @@ -4303,7 +4304,7 @@ static int igb_change_mtu(struct net_device
>>>>>>> *netdev, int new_mtu)
>>>>>>>    {
>>>>>>>          struct igb_adapter *adapter = netdev_priv(netdev);
>>>>>>>          struct pci_dev *pdev = adapter->pdev;
>>>>>>> -       int max_frame = new_mtu + ETH_HLEN + ETH_FCS_LEN;
>>>>>>> +       int max_frame = new_mtu + ETH_HLEN + ETH_FCS_LEN + VLAN_HLEN;
>>>>>>>          u32 rx_buffer_len, i;
>>>>>>>
>>>>>>>          if ((new_mtu<      68) || (max_frame>      MAX_JUMBO_FRAME_SIZE)) {
>>>>>>
>>>>>> While testing this patch, validation found that the patch reduces the
>>>>>> maximum mtu size
>>>>>> by 4 bytes (reduces it from 9216 to 9212).  This is not a desired side
>>>>>> effect of this patch.
>>>>>
>>>>> You could add handling for that case and have it act as it used to when
>>>>> new_mtu is greater than 9212?
>>>>>
>>>>> I tested e1000e and it worked w/out hacking at 1500 MTU, so maybe
>>>>> check how it does it?
>>>>
>>>> I just wanted to bring this up again to see if any progress had been
>>>> made.  We were looking at this driver and trying to figure out the
>>>> best way to convert it to use the new vlan model but I'm not familiar
>>>
>>> I've been watching :)
>>>
>>>> enough with the hardware to know.  It seems that all of the other
>>>> Intel drivers unconditionally add space for the vlan tag to the
>>>> receive buffer (and would therefore have similar effects as this
>>>> patch), is there something different about this card?
>>>>
>>>> I believe that Alex was working on something in this area (in the
>>>> context of one of my patches from a long time ago) but I'm not sure
>>>> what came of that.
>>>
>>> Truth is, I don't really see why it's a problem to decrease the
>>> maximum MTU slightly in order to make it work with VLANs.
>>>
>>> I'm not sure if there is some way to make it work with VLANs
>>> and not decrease the maximum MTU.
>>
>> This was the reason this did not get accepted.  I was looking into what
>> could be done so that we did not decease the maximum MTU, but I got
>> side-tracked and have not done anything on it in several months.
>>
>
> I can take a look at fixing this most likely tomorrow.  I have some
> work planned for igb anyway over the next few days.
>
> Odds are it is just a matter of where the VLAN_HLEN is added.  As I
> recall for our drivers the correct spot is in the setting of
> rx_buffer_len since that is the area more concerned with maximum
> receive frame size versus the mtu section which is more concerned with
> the transmit side of things.

Did a patch for this ever get posted?  I'll be happy to test it
if so...

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply

* Re: [RFT PATCH v3 00/12] Cleanup and extension of netdev features
From: Ben Greear @ 2011-08-25 19:04 UTC (permalink / raw)
  To: Michał Mirosław; +Cc: netdev, David S. Miller, Ben Hutchings
In-Reply-To: <cover.1308758435.git.mirq-linux@rere.qmqm.pl>

On 06/22/2011 09:04 AM, Michał Mirosław wrote:
> v3 of a feature handling cleanup and extension series. For testing, you
> might want user-space ethtool patched with:
>
> http://patchwork.ozlabs.org/patch/96374/

It looks like this is not in net-next yet...any hope of this
going in soon?

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply

* Re: [RFC PATCH v2 0/9] bql: Byte Queue Limits
From: jamal @ 2011-08-25 20:23 UTC (permalink / raw)
  To: Tom Herbert; +Cc: Johannes Berg, davem, netdev
In-Reply-To: <CA+mtBx8mb0pAqVH19WK=e9LuLeyvnNTybROOURD2Rg6UtYe+bw@mail.gmail.com>

On Thu, 2011-08-25 at 08:29 -0700, Tom Herbert wrote:

> BQL is dynamic, and will increase the queue limit more aggressively
> than decrease it.  So for instance, we can track the largest queue
> needed over 30 seconds which should be stable in the presence even in
> the presence of fluctuating bandwidth.  The thing that worries me is
> rather the HW queues conform to the queue characteristics described in
> the patch.  If transmit completions are random and not regular, BQL
> probably can't function well.
> 

I think thats the challenge ;-> I wouldnt say it is random, but if my
understanding is correct the effect is a factor of number of stations
etc.

> If you'd like to bring this up on some wireless devices that would be
> great, I don't have easy access to any right now, but I can try to
> help otherwise.

I am most curious as well...

cheers,
jamal

^ permalink raw reply

* Re: Traffic shaping - class ID 16bit limit?
From: jamal @ 2011-08-25 20:26 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Miroslav Kratochvil, netdev
In-Reply-To: <20110825093937.2a8a1457@nehalam.ftrdhcpuser.net>

On Thu, 2011-08-25 at 09:39 -0700, Stephen Hemminger wrote:

> Granted it was a poor choice in the initial design.
> It is wired into the API and changing it would be quite painful.
> 

You should be able to have infinite queues if you use the 
hierachies. i.e each hierachy should expose a new 16 bit
namespace. 

cheers,
jamal

^ permalink raw reply

* [PATCH net-next 1/1] af_packet: Prefixed tpacket_v3 structs to avoid name space collision
From: Chetan Loke @ 2011-08-25 20:43 UTC (permalink / raw)
  To: netdev, davem; +Cc: Chetan Loke

structs introduced in tpacket_v3 implementation are prefixed with 'tpacket'
to avoid namespace collision.

Compile tested.

Signed-off-by: Chetan Loke <loke.chetan@gmail.com>
---
 include/linux/if_packet.h |   18 ++++----
 net/packet/af_packet.c    |  117 ++++++++++++++++++++++++---------------------
 2 files changed, 71 insertions(+), 64 deletions(-)

diff --git a/include/linux/if_packet.h b/include/linux/if_packet.h
index 5926d59..5e76988 100644
--- a/include/linux/if_packet.h
+++ b/include/linux/if_packet.h
@@ -126,7 +126,7 @@ struct tpacket2_hdr {
 	__u16		tp_padding;
 };
 
-struct hdr_variant1 {
+struct tpacket_hdr_variant1 {
 	__u32	tp_rxhash;
 	__u32	tp_vlan_tci;
 };
@@ -142,11 +142,11 @@ struct tpacket3_hdr {
 	__u16		tp_net;
 	/* pkt_hdr variants */
 	union {
-		struct hdr_variant1 hv1;
+		struct tpacket_hdr_variant1 hv1;
 	};
 };
 
-struct bd_ts {
+struct tpacket_bd_ts {
 	unsigned int ts_sec;
 	union {
 		unsigned int ts_usec;
@@ -154,7 +154,7 @@ struct bd_ts {
 	};
 };
 
-struct hdr_v1 {
+struct tpacket_hdr_v1 {
 	__u32	block_status;
 	__u32	num_pkts;
 	__u32	offset_to_first_pkt;
@@ -200,17 +200,17 @@ struct hdr_v1 {
 	 *			Use the ts of the first packet in the block.
 	 *
 	 */
-	struct bd_ts	ts_first_pkt, ts_last_pkt;
+	struct tpacket_bd_ts	ts_first_pkt, ts_last_pkt;
 };
 
-union bd_header_u {
-	struct hdr_v1 bh1;
+union tpacket_bd_header_u {
+	struct tpacket_hdr_v1 bh1;
 };
 
-struct block_desc {
+struct tpacket_block_desc {
 	__u32 version;
 	__u32 offset_to_priv;
-	union bd_header_u hdr;
+	union tpacket_bd_header_u hdr;
 };
 
 #define TPACKET2_HDRLEN		(TPACKET_ALIGN(sizeof(struct tpacket2_hdr)) + sizeof(struct sockaddr_ll))
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 4371e3a..2ea3d63 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -171,13 +171,13 @@ static int packet_set_ring(struct sock *sk, union tpacket_req_u *req_u,
 
 #define V3_ALIGNMENT	(8)
 
-#define BLK_HDR_LEN	(ALIGN(sizeof(struct block_desc), V3_ALIGNMENT))
+#define BLK_HDR_LEN	(ALIGN(sizeof(struct tpacket_block_desc), V3_ALIGNMENT))
 
 #define BLK_PLUS_PRIV(sz_of_priv) \
 	(BLK_HDR_LEN + ALIGN((sz_of_priv), V3_ALIGNMENT))
 
 /* kbdq - kernel block descriptor queue */
-struct kbdq_core {
+struct tpacket_kbdq_core {
 	struct pgv	*pkbdq;
 	unsigned int	feature_req_word;
 	unsigned int	hdrlen;
@@ -230,7 +230,7 @@ struct packet_ring_buffer {
 	unsigned int		pg_vec_pages;
 	unsigned int		pg_vec_len;
 
-	struct kbdq_core	prb_bdqc;
+	struct tpacket_kbdq_core	prb_bdqc;
 	atomic_t		pending;
 };
 
@@ -249,21 +249,25 @@ static void *packet_previous_frame(struct packet_sock *po,
 		struct packet_ring_buffer *rb,
 		int status);
 static void packet_increment_head(struct packet_ring_buffer *buff);
-static int prb_curr_blk_in_use(struct kbdq_core *,
-			struct block_desc *);
-static void *prb_dispatch_next_block(struct kbdq_core *,
+static int prb_curr_blk_in_use(struct tpacket_kbdq_core *,
+			struct tpacket_block_desc *);
+static void *prb_dispatch_next_block(struct tpacket_kbdq_core *,
 			struct packet_sock *);
-static void prb_retire_current_block(struct kbdq_core *,
+static void prb_retire_current_block(struct tpacket_kbdq_core *,
 		struct packet_sock *, unsigned int status);
-static int prb_queue_frozen(struct kbdq_core *);
-static void prb_open_block(struct kbdq_core *, struct block_desc *);
+static int prb_queue_frozen(struct tpacket_kbdq_core *);
+static void prb_open_block(struct tpacket_kbdq_core *,
+		struct tpacket_block_desc *);
 static void prb_retire_rx_blk_timer_expired(unsigned long);
-static void _prb_refresh_rx_retire_blk_timer(struct kbdq_core *);
-static void prb_init_blk_timer(struct packet_sock *, struct kbdq_core *,
-				void (*func) (unsigned long));
-static void prb_fill_rxhash(struct kbdq_core *, struct tpacket3_hdr *);
-static void prb_clear_rxhash(struct kbdq_core *, struct tpacket3_hdr *);
-static void prb_fill_vlan_info(struct kbdq_core *, struct tpacket3_hdr *);
+static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *);
+static void prb_init_blk_timer(struct packet_sock *,
+		struct tpacket_kbdq_core *,
+		void (*func) (unsigned long));
+static void prb_fill_rxhash(struct tpacket_kbdq_core *, struct tpacket3_hdr *);
+static void prb_clear_rxhash(struct tpacket_kbdq_core *,
+		struct tpacket3_hdr *);
+static void prb_fill_vlan_info(struct tpacket_kbdq_core *,
+		struct tpacket3_hdr *);
 static void packet_flush_mclist(struct sock *sk);
 
 struct packet_fanout;
@@ -322,11 +326,11 @@ struct packet_skb_cb {
 
 #define PACKET_SKB_CB(__skb)	((struct packet_skb_cb *)((__skb)->cb))
 
-#define GET_PBDQC_FROM_RB(x)	((struct kbdq_core *)(&(x)->prb_bdqc))
+#define GET_PBDQC_FROM_RB(x)	((struct tpacket_kbdq_core *)(&(x)->prb_bdqc))
 #define GET_PBLOCK_DESC(x, bid)	\
-	((struct block_desc *)((x)->pkbdq[(bid)].buffer))
+	((struct tpacket_block_desc *)((x)->pkbdq[(bid)].buffer))
 #define GET_CURR_PBLOCK_DESC_FROM_CORE(x)	\
-	((struct block_desc *)((x)->pkbdq[(x)->kactive_blk_num].buffer))
+	((struct tpacket_block_desc *)((x)->pkbdq[(x)->kactive_blk_num].buffer))
 #define GET_NEXT_PRB_BLK_NUM(x) \
 	(((x)->kactive_blk_num < ((x)->knum_blocks-1)) ? \
 	((x)->kactive_blk_num+1) : 0)
@@ -480,7 +484,7 @@ static inline void *packet_current_frame(struct packet_sock *po,
 	return packet_lookup_frame(po, rb, rb->head, status);
 }
 
-static void prb_del_retire_blk_timer(struct kbdq_core *pkc)
+static void prb_del_retire_blk_timer(struct tpacket_kbdq_core *pkc)
 {
 	del_timer_sync(&pkc->retire_blk_timer);
 }
@@ -489,7 +493,7 @@ static void prb_shutdown_retire_blk_timer(struct packet_sock *po,
 		int tx_ring,
 		struct sk_buff_head *rb_queue)
 {
-	struct kbdq_core *pkc;
+	struct tpacket_kbdq_core *pkc;
 
 	pkc = tx_ring ? &po->tx_ring.prb_bdqc : &po->rx_ring.prb_bdqc;
 
@@ -501,7 +505,7 @@ static void prb_shutdown_retire_blk_timer(struct packet_sock *po,
 }
 
 static void prb_init_blk_timer(struct packet_sock *po,
-		struct kbdq_core *pkc,
+		struct tpacket_kbdq_core *pkc,
 		void (*func) (unsigned long))
 {
 	init_timer(&pkc->retire_blk_timer);
@@ -512,7 +516,7 @@ static void prb_init_blk_timer(struct packet_sock *po,
 
 static void prb_setup_retire_blk_timer(struct packet_sock *po, int tx_ring)
 {
-	struct kbdq_core *pkc;
+	struct tpacket_kbdq_core *pkc;
 
 	if (tx_ring)
 		BUG();
@@ -568,7 +572,7 @@ static int prb_calc_retire_blk_tmo(struct packet_sock *po,
 	return tmo;
 }
 
-static void prb_init_ft_ops(struct kbdq_core *p1,
+static void prb_init_ft_ops(struct tpacket_kbdq_core *p1,
 			union tpacket_req_u *req_u)
 {
 	p1->feature_req_word = req_u->req3.tp_feature_req_word;
@@ -579,14 +583,14 @@ static void init_prb_bdqc(struct packet_sock *po,
 			struct pgv *pg_vec,
 			union tpacket_req_u *req_u, int tx_ring)
 {
-	struct kbdq_core *p1 = &rb->prb_bdqc;
-	struct block_desc *pbd;
+	struct tpacket_kbdq_core *p1 = &rb->prb_bdqc;
+	struct tpacket_block_desc *pbd;
 
 	memset(p1, 0x0, sizeof(*p1));
 
 	p1->knxt_seq_num = 1;
 	p1->pkbdq = pg_vec;
-	pbd = (struct block_desc *)pg_vec[0].buffer;
+	pbd = (struct tpacket_block_desc *)pg_vec[0].buffer;
 	p1->pkblk_start	= (char *)pg_vec[0].buffer;
 	p1->kblk_size = req_u->req3.tp_block_size;
 	p1->knum_blocks	= req_u->req3.tp_block_nr;
@@ -610,7 +614,7 @@ static void init_prb_bdqc(struct packet_sock *po,
 /*  Do NOT update the last_blk_num first.
  *  Assumes sk_buff_head lock is held.
  */
-static void _prb_refresh_rx_retire_blk_timer(struct kbdq_core *pkc)
+static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *pkc)
 {
 	mod_timer(&pkc->retire_blk_timer,
 			jiffies + pkc->tov_in_jiffies);
@@ -643,9 +647,9 @@ static void _prb_refresh_rx_retire_blk_timer(struct kbdq_core *pkc)
 static void prb_retire_rx_blk_timer_expired(unsigned long data)
 {
 	struct packet_sock *po = (struct packet_sock *)data;
-	struct kbdq_core *pkc = &po->rx_ring.prb_bdqc;
+	struct tpacket_kbdq_core *pkc = &po->rx_ring.prb_bdqc;
 	unsigned int frozen;
-	struct block_desc *pbd;
+	struct tpacket_block_desc *pbd;
 
 	spin_lock(&po->sk.sk_receive_queue.lock);
 
@@ -709,8 +713,8 @@ out:
 	spin_unlock(&po->sk.sk_receive_queue.lock);
 }
 
-static inline void prb_flush_block(struct kbdq_core *pkc1,
-		struct block_desc *pbd1, __u32 status)
+static inline void prb_flush_block(struct tpacket_kbdq_core *pkc1,
+		struct tpacket_block_desc *pbd1, __u32 status)
 {
 	/* Flush everything minus the block header */
 
@@ -752,13 +756,14 @@ static inline void prb_flush_block(struct kbdq_core *pkc1,
  * Note:We DONT refresh the timer on purpose.
  *	Because almost always the next block will be opened.
  */
-static void prb_close_block(struct kbdq_core *pkc1, struct block_desc *pbd1,
+static void prb_close_block(struct tpacket_kbdq_core *pkc1,
+		struct tpacket_block_desc *pbd1,
 		struct packet_sock *po, unsigned int stat)
 {
 	__u32 status = TP_STATUS_USER | stat;
 
 	struct tpacket3_hdr *last_pkt;
-	struct hdr_v1 *h1 = &pbd1->hdr.bh1;
+	struct tpacket_hdr_v1 *h1 = &pbd1->hdr.bh1;
 
 	if (po->stats.tp_drops)
 		status |= TP_STATUS_LOSING;
@@ -786,7 +791,7 @@ static void prb_close_block(struct kbdq_core *pkc1, struct block_desc *pbd1,
 	pkc1->kactive_blk_num = GET_NEXT_PRB_BLK_NUM(pkc1);
 }
 
-static inline void prb_thaw_queue(struct kbdq_core *pkc)
+static inline void prb_thaw_queue(struct tpacket_kbdq_core *pkc)
 {
 	pkc->reset_pending_on_curr_blk = 0;
 }
@@ -798,10 +803,11 @@ static inline void prb_thaw_queue(struct kbdq_core *pkc)
  * 2) retire_blk_timer is refreshed.
  *
  */
-static void prb_open_block(struct kbdq_core *pkc1, struct block_desc *pbd1)
+static void prb_open_block(struct tpacket_kbdq_core *pkc1,
+	struct tpacket_block_desc *pbd1)
 {
 	struct timespec ts;
-	struct hdr_v1 *h1 = &pbd1->hdr.bh1;
+	struct tpacket_hdr_v1 *h1 = &pbd1->hdr.bh1;
 
 	smp_rmb();
 
@@ -861,7 +867,7 @@ static void prb_open_block(struct kbdq_core *pkc1, struct block_desc *pbd1)
  *         case and __packet_lookup_frame_in_block will check if block-0
  *         is free and can now be re-used.
  */
-static inline void prb_freeze_queue(struct kbdq_core *pkc,
+static inline void prb_freeze_queue(struct tpacket_kbdq_core *pkc,
 				  struct packet_sock *po)
 {
 	pkc->reset_pending_on_curr_blk = 1;
@@ -876,10 +882,10 @@ static inline void prb_freeze_queue(struct kbdq_core *pkc,
  * Else, we will freeze the queue.
  * So, caller must check the return value.
  */
-static void *prb_dispatch_next_block(struct kbdq_core *pkc,
+static void *prb_dispatch_next_block(struct tpacket_kbdq_core *pkc,
 		struct packet_sock *po)
 {
-	struct block_desc *pbd;
+	struct tpacket_block_desc *pbd;
 
 	smp_rmb();
 
@@ -901,10 +907,10 @@ static void *prb_dispatch_next_block(struct kbdq_core *pkc,
 	return (void *)pkc->nxt_offset;
 }
 
-static void prb_retire_current_block(struct kbdq_core *pkc,
+static void prb_retire_current_block(struct tpacket_kbdq_core *pkc,
 		struct packet_sock *po, unsigned int status)
 {
-	struct block_desc *pbd = GET_CURR_PBLOCK_DESC_FROM_CORE(pkc);
+	struct tpacket_block_desc *pbd = GET_CURR_PBLOCK_DESC_FROM_CORE(pkc);
 
 	/* retire/close the current block */
 	if (likely(TP_STATUS_KERNEL == BLOCK_STATUS(pbd))) {
@@ -932,36 +938,36 @@ static void prb_retire_current_block(struct kbdq_core *pkc,
 	BUG();
 }
 
-static inline int prb_curr_blk_in_use(struct kbdq_core *pkc,
-				      struct block_desc *pbd)
+static inline int prb_curr_blk_in_use(struct tpacket_kbdq_core *pkc,
+				      struct tpacket_block_desc *pbd)
 {
 	return TP_STATUS_USER & BLOCK_STATUS(pbd);
 }
 
-static inline int prb_queue_frozen(struct kbdq_core *pkc)
+static inline int prb_queue_frozen(struct tpacket_kbdq_core *pkc)
 {
 	return pkc->reset_pending_on_curr_blk;
 }
 
 static inline void prb_clear_blk_fill_status(struct packet_ring_buffer *rb)
 {
-	struct kbdq_core *pkc  = GET_PBDQC_FROM_RB(rb);
+	struct tpacket_kbdq_core *pkc  = GET_PBDQC_FROM_RB(rb);
 	atomic_dec(&pkc->blk_fill_in_prog);
 }
 
-static inline void prb_fill_rxhash(struct kbdq_core *pkc,
+static inline void prb_fill_rxhash(struct tpacket_kbdq_core *pkc,
 			struct tpacket3_hdr *ppd)
 {
 	ppd->hv1.tp_rxhash = skb_get_rxhash(pkc->skb);
 }
 
-static inline void prb_clear_rxhash(struct kbdq_core *pkc,
+static inline void prb_clear_rxhash(struct tpacket_kbdq_core *pkc,
 			struct tpacket3_hdr *ppd)
 {
 	ppd->hv1.tp_rxhash = 0;
 }
 
-static inline void prb_fill_vlan_info(struct kbdq_core *pkc,
+static inline void prb_fill_vlan_info(struct tpacket_kbdq_core *pkc,
 			struct tpacket3_hdr *ppd)
 {
 	if (vlan_tx_tag_present(pkc->skb)) {
@@ -972,7 +978,7 @@ static inline void prb_fill_vlan_info(struct kbdq_core *pkc,
 	}
 }
 
-static void prb_run_all_ft_ops(struct kbdq_core *pkc,
+static void prb_run_all_ft_ops(struct tpacket_kbdq_core *pkc,
 			struct tpacket3_hdr *ppd)
 {
 	prb_fill_vlan_info(pkc, ppd);
@@ -983,8 +989,9 @@ static void prb_run_all_ft_ops(struct kbdq_core *pkc,
 		prb_clear_rxhash(pkc, ppd);
 }
 
-static inline void prb_fill_curr_block(char *curr, struct kbdq_core *pkc,
-				struct block_desc *pbd,
+static inline void prb_fill_curr_block(char *curr,
+				struct tpacket_kbdq_core *pkc,
+				struct tpacket_block_desc *pbd,
 				unsigned int len)
 {
 	struct tpacket3_hdr *ppd;
@@ -1006,8 +1013,8 @@ static void *__packet_lookup_frame_in_block(struct packet_sock *po,
 					    unsigned int len
 					    )
 {
-	struct kbdq_core *pkc;
-	struct block_desc *pbd;
+	struct tpacket_kbdq_core *pkc;
+	struct tpacket_block_desc *pbd;
 	char *curr, *end;
 
 	pkc = GET_PBDQC_FROM_RB(((struct packet_ring_buffer *)&po->rx_ring));
@@ -1087,8 +1094,8 @@ static inline void *prb_lookup_block(struct packet_sock *po,
 				     unsigned int previous,
 				     int status)
 {
-	struct kbdq_core *pkc  = GET_PBDQC_FROM_RB(rb);
-	struct block_desc *pbd = GET_PBLOCK_DESC(pkc, previous);
+	struct tpacket_kbdq_core *pkc  = GET_PBDQC_FROM_RB(rb);
+	struct tpacket_block_desc *pbd = GET_PBLOCK_DESC(pkc, previous);
 
 	if (status != BLOCK_STATUS(pbd))
 		return NULL;
-- 
1.7.5.2

^ permalink raw reply related

* RFC: Remove unnecessary / duplicate OOM printks
From: Joe Perches @ 2011-08-25 20:47 UTC (permalink / raw)
  To: LKML; +Cc: Eric Dumazet, netdev

There are many thousands of printks for OOM conditions
in kernel sources.

These are almost always a duplication of a generic
OOM message from the mm subsystem.

The biggest difference between the generic OOM and
the specific OOM uses is that most of the specific
messages are emitted at KERN_ERR but the generic
message is at KERN_WARNING.

Many KB of code/text could be removed from the kernel.

Removal can be gradual and done by subsystem.

Some kmalloc's that are followed on failure by vmalloc
may need to add GFP_NOWARN.

Does anyone really believe the per site failure
messages are useful or really want them to keep them?

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox