* Re: Traffic shaping - class ID 16bit limit?
From: Miroslav Kratochvil @ 2011-08-25 17:06 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20110825093937.2a8a1457@nehalam.ftrdhcpuser.net>
>> Technically the ClassID seems to be "hardcoded" as a 16bit value, but
>> after some source searching, I haven't found any good reason for it to
>> be 16-bit only.
>
> Granted it was a poor choice in the initial design.
> It is wired into the API and changing it would be quite painful.
>
I was feeling something like that would come.
If I get it correctly, the API change would consist of:
- some netlink protocol change
- slight modification of qdisc_class_hash
- modifications in all (four?) hierarchical schedulers
- tiny expansion of userspace tc utility
which isn't that painful (except for the CBQ part), but I'm probably
missing something, and presumably the change would take some time to
get mainstream -- probably way more time than writing a hfsc clone
that is controlled using some other interface than tc/netlink. :(
(but hey! I have a topic for school work!)
> You might be able to do the same thing by splitting traffic
> into multiple virtual devices (dummy or ifb) and then doing
> another layer.
>
My scenario looks pretty simple, mostly like a big hashing filter
attached at the device root, flowid'ing the stuff to leaf classes.
Could you please provide some simple illustration of splitting that
into multiple devices? I guess that the main problem with this
approach would be that my subclasses usually don't share anything in
common, especially not any pretty IP prefixes that would allow good
splitting.
Anyway, thanks very much for response!
-mk
^ permalink raw reply
* Re: Traffic shaping - class ID 16bit limit?
From: Stephen Hemminger @ 2011-08-25 17:10 UTC (permalink / raw)
To: Miroslav Kratochvil; +Cc: netdev
In-Reply-To: <CAO0uZ+_6xC0gymfbu28PRK4SaVgkGaSbbe-PgXvZ4h-cPp8k2A@mail.gmail.com>
On Thu, 25 Aug 2011 19:06:58 +0200
Miroslav Kratochvil <exa.exa@gmail.com> wrote:
> >> Technically the ClassID seems to be "hardcoded" as a 16bit value, but
> >> after some source searching, I haven't found any good reason for it to
> >> be 16-bit only.
> >
> > Granted it was a poor choice in the initial design.
> > It is wired into the API and changing it would be quite painful.
> >
>
> I was feeling something like that would come.
>
> If I get it correctly, the API change would consist of:
>
> - some netlink protocol change
> - slight modification of qdisc_class_hash
> - modifications in all (four?) hierarchical schedulers
> - tiny expansion of userspace tc utility
And all the magic compatiablity layers to make old code
work with new code.
^ permalink raw reply
* Re: [RFC] per-containers tcp buffer limitation
From: Glauber Costa @ 2011-08-25 18:02 UTC (permalink / raw)
To: Eric W. Biederman
Cc: KAMEZAWA Hiroyuki, Linux Containers, netdev, David Miller,
Pavel Emelyanov
In-Reply-To: <m14o16qlq1.fsf@fess.ebiederm.org>
On 08/24/2011 11:16 PM, Eric W. Biederman wrote:
> KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com> writes:
>
>> On Wed, 24 Aug 2011 22:28:59 -0300
>> Glauber Costa<glommer@parallels.com> wrote:
>>
>>> On 08/24/2011 09:35 PM, Eric W. Biederman wrote:
>>>> Glauber Costa<glommer@parallels.com> writes:
>>> Hi Eric,
>>>
>>> Thanks for your attention.
>>>
>>> So, this that you propose was my first implementation. I ended up
>>> throwing it away after playing with it for a while.
>>>
>>> One of the first problems that arise from that, is that the sysctls are
>>> a tunable visible from inside the container. Those limits, however, are
>>> to be set from the outside world. The code is not much better than that
>>> either, and instead of creating new cgroup structures and linking them
>>> to the protocol, we end up doing it for net ns. We end up increasing
>>> structures just the same...
>
> You don't need to add a netns member to sockets.
But then you have to grow the netns structure itself somehow.
>
> But I do agree that there are odd permission issues with using the
> existing sysctls and making them per namespace.
>
> However almost everything I have seen with memory limits I have found
> very strange. They all seem like a very bad version of disabling memory
> over commits.
More or less. At least from our perspective, the only thing we're really
interested in capping are non-swappable resources. So you could not
overcommit anyway.
For the sockets/tcp case, it is an even easier case. The code as it is
today already allow you to define soft and hard memory limits: I am just
making it container-wide, instead of system-wide.
>>> Also, since we're doing resource control, it seems more natural to use
>>> cgroups. Now, the fact that there are no correlation whatsoever between
>>> cgroups and namespaces does bother me. But that's another story, much
>>> more broader and general than this patch.
>>>
>>
>> I think using cgroup makes sense. A question in mind is whehter it is
>> better to integrate this kind of 'memory usage' controls to memcg or
>> not.
>
> Maybe. When sockets start getting a cgroup member I start wondering,
> how many cgroup members will sockets potentially belong to.
>
>> How do you think ? IMHO, having cgroup per class of object is messy.
>> ...
>> How about adding
>> memory.tcp_mem
>> to memcg ?
>>
>> Or, adding kmem cgroup ?
>>
>>> About overhead, since this is the first RFC, I did not care about
>>> measuring. However, it seems trivial to me to guarantee that at least
>>> that it won't impose a significant performance penalty when it is
>>> compiled out. If we're moving forward with this implementation, I will
>>> include data in the next release so we can discuss in this basis.
>>>
>>
>> IMHO, you should show performance number even if RFC. Then, people will
>> see patch with more interests.
>
> And also compiled out doesn't really count. Cgroups are something you
> want people to compile into distributions for the common case, and you
> don't want to impose a noticeable performance penalty for the common
> case.
Absolutely agreed.
^ permalink raw reply
* Re: [RFC] per-containers tcp buffer limitation
From: Glauber Costa @ 2011-08-25 18:05 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: Eric W. Biederman, Linux Containers, netdev, David Miller,
Pavel Emelyanov
In-Reply-To: <20110825104956.41c4b60e.kamezawa.hiroyu@jp.fujitsu.com>
On 08/24/2011 10:49 PM, KAMEZAWA Hiroyuki wrote:
> On Wed, 24 Aug 2011 22:28:59 -0300
> Glauber Costa<glommer@parallels.com> wrote:
>
>> On 08/24/2011 09:35 PM, Eric W. Biederman wrote:
>>> Glauber Costa<glommer@parallels.com> writes:
>>>
>>>> Hello,
>>>>
>>>> This is a proof of concept of some code I have here to limit tcp send and
>>>> receive buffers per-container (in our case). At this phase, I am more concerned
>>>> in discussing my approach, so please curse my family no further than the 3rd
>>>> generation.
>>>>
>>>> The problem we're trying to attack here, is that buffers can grow and fill
>>>> non-reclaimable kernel memory. When doing containers, we can't afford having a
>>>> malicious container pinning kernel memory at will, therefore exhausting all the
>>>> others.
>>>>
>>>> So here a container will be seen in the host system as a group of tasks, grouped
>>>> in a cgroup. This cgroup will have files allowing us to specify global
>>>> per-cgroup limits on buffers. For that purpose, I created a new sockets cgroup -
>>>> didn't really think any other one of the existing would do here.
>>>>
>>>> As for the network code per-se, I tried to keep the same code that deals with
>>>> memory schedule as a basis and make it per-cgroup.
>>>> You will notice that struct proto now take function pointers to values
>>>> controlling memory pressure and will return per-cgroup data instead of global
>>>> ones. So the current behavior is maintained: after the first threshold is hit,
>>>> we enter memory pressure. After that, allocations are suppressed.
>>>>
>>>> Only tcp code was really touched here. udp had the pointers filled, but we're
>>>> not really controlling anything. But the fact that this lives in generic code,
>>>> makes it easier to do the same for other protocols in the future.
>>>>
>>>> For this patch specifically, I am not touching - just provisioning -
>>>> rmem and wmem specific knobs. I should also #ifdef a lot of this, but hey,
>>>> remember: rfc...
>>>>
>>>> One drawback of this approach I found, is that cgroups does not really work well
>>>> with modules. A lot of the network code is modularized, so this would have to be
>>>> fixed somehow.
>>>>
>>>> Let me know what you think.
>>>
>>> Can you implement this by making the existing network sysctls per
>>> network namespace?
>>>
>>> At a quick skim it looks to me like you can make the existing sysctls
>>> per network namespace and solve the issues you are aiming at solving and
>>> that should make the code much simpler, than your proof of concept code.
>>>
>>> Any implementation of this needs to answer the question how much
>>> overhead does this extra accounting add. I don't have a clue how much
>>> overhead you are adding but you are making structures larger and I
>>> suspect adding at least another cache line miss, so I suspect your
>>> changes will impact real world socket performance.
>>
>> Hi Eric,
>>
>> Thanks for your attention.
>>
>> So, this that you propose was my first implementation. I ended up
>> throwing it away after playing with it for a while.
>>
>> One of the first problems that arise from that, is that the sysctls are
>> a tunable visible from inside the container. Those limits, however, are
>> to be set from the outside world. The code is not much better than that
>> either, and instead of creating new cgroup structures and linking them
>> to the protocol, we end up doing it for net ns. We end up increasing
>> structures just the same...
>>
>> Also, since we're doing resource control, it seems more natural to use
>> cgroups. Now, the fact that there are no correlation whatsoever between
>> cgroups and namespaces does bother me. But that's another story, much
>> more broader and general than this patch.
>>
>
> I think using cgroup makes sense. A question in mind is whehter it is
> better to integrate this kind of 'memory usage' controls to memcg or not.
>
> How do you think ? IMHO, having cgroup per class of object is messy.
> ...
> How about adding
> memory.tcp_mem
> to memcg ?
>
> Or, adding kmem cgroup ?
I don't really care which cgroup do we use. I choosed a new socket one,
because they are usually not like other objects. People love tweaking
network aspects, and it is not hard to imagine people wanting to extend it.
Now, if all of this will ever belong to cgroup, is of course a different
matter.
Between your two suggestions, I like kmem better. It makes it then
absolutely clear that we will handle kernel objects only...
>> About overhead, since this is the first RFC, I did not care about
>> measuring. However, it seems trivial to me to guarantee that at least
>> that it won't impose a significant performance penalty when it is
>> compiled out. If we're moving forward with this implementation, I will
>> include data in the next release so we can discuss in this basis.
>>
>
> IMHO, you should show performance number even if RFC. Then, people will
> see patch with more interests.
Let's call this one pre-RFC then.
^ permalink raw reply
* Re: [RFC] per-containers tcp buffer limitation
From: Glauber Costa @ 2011-08-25 18:11 UTC (permalink / raw)
To: Stephen Hemminger
Cc: Chris Friesen, Daniel Wagner, Eric W. Biederman,
KAMEZAWA Hiroyuki, Linux Containers, netdev, David Miller,
Pavel Emelyanov
In-Reply-To: <20110825084415.3c3094e8@nehalam.ftrdhcpuser.net>
On 08/25/2011 12:44 PM, Stephen Hemminger wrote:
> You seem to have forgotten the work of your forefathers. When appealing
> to history you must understand it first.
>
> What about using netfilter (with extensions)? We already have iptables
> module to match on uid or gid. It wouldn't be hard to extend this to
> other bits of meta data like originating and target containers.
>
> You could also use this to restrict access to ports and hosts on
> a per container basis.
>
Hello Stephen,
I am pretty sure netfilter can provide us with amazing functionality
that will help our containers implementation a lot.
I don't think, however, that memory limitation belongs in there. First
of all, IIRC, we are not dropping packets, re-routing, dealing with any
low level characteristic, etc. We're just controlling buffer size. This
seems orthogonal to the work of netfilter.
Think for instance, in the soft limit: When we hit it, we enter a memory
pressure scenario. How would netfilter handle that?
So I guess cgroup is still better suited for this very specific task we
have in mind here. For most of the others, I have no doubt that
netfilter would come handy.
Thanks for your time!
^ permalink raw reply
* Re: [RFC] per-containers tcp buffer limitation
From: Daniel Wagner @ 2011-08-25 18:27 UTC (permalink / raw)
To: Chris Friesen
Cc: Eric W. Biederman, KAMEZAWA Hiroyuki, Glauber Costa,
Linux Containers, netdev, David Miller, Pavel Emelyanov
In-Reply-To: <4E5664B5.6000806@genband.com>
Hi Chris,
On 08/25/2011 05:05 PM, Chris Friesen wrote:
> On 08/25/2011 06:55 AM, Daniel Wagner wrote:
>
>> I'd like to solve a use case where it is necessary to count all bytes
>> transmitted and received by an application [1]. So far I have found two
>> unsatisfying solution for it. The first one is to hook into libc and
>> count the bytes there. I don't think I have to say I don't like this.
>
> Is there any particular reason you can't use LD_PRELOAD to interpose a
> library to do the statistics monitoring?
This is certainly possible to do for any dynamic linked application. I
think it wouldn't work for static linked ones. Currently I don't know if
I have to face such application on the project I am on. The reason why I
am not so a big fan is that using the LD_PRELOAD trick seems very
hackish to me.
As Glauber has argumented in this thread, there are properly quite a few
people who want to control or monitor sockets. It seems I am one of
those. Having this kind of support in cgroups seems like a very neat
solution to me.
thanks,
daniel
^ permalink raw reply
* Re: [RFC] per-containers tcp buffer limitation
From: Daniel Wagner @ 2011-08-25 18:33 UTC (permalink / raw)
To: Stephen Hemminger
Cc: Chris Friesen, Eric W. Biederman, KAMEZAWA Hiroyuki,
Glauber Costa, Linux Containers, netdev, David Miller,
Pavel Emelyanov
In-Reply-To: <20110825084415.3c3094e8@nehalam.ftrdhcpuser.net>
Hi Stephen,
On 08/25/2011 05:44 PM, Stephen Hemminger wrote:
> What about using netfilter (with extensions)? We already have iptables
> module to match on uid or gid. It wouldn't be hard to extend this to
> other bits of meta data like originating and target containers.
>From reading the man pages the "owner" extension of netfilter would only
allow to match on outgoing traffic. Would it be possible to extend this
to also match on incoming traffic? Sorry to be completely ignorant here.
thanks,
daniel
^ permalink raw reply
* Re: [RFC] per-containers tcp buffer limitation
From: Daniel Wagner @ 2011-08-25 18:45 UTC (permalink / raw)
To: Stephen Hemminger
Cc: Chris Friesen, Eric W. Biederman, KAMEZAWA Hiroyuki,
Glauber Costa, Linux Containers, netdev, David Miller,
Pavel Emelyanov
In-Reply-To: <4E569571.1080603@monom.org>
Hi Stephen,
> On 08/25/2011 05:44 PM, Stephen Hemminger wrote:
>> What about using netfilter (with extensions)? We already have iptables
>> module to match on uid or gid. It wouldn't be hard to extend this to
>> other bits of meta data like originating and target containers.
>
> From reading the man pages the "owner" extension of netfilter would only
> allow to match on outgoing traffic. Would it be possible to extend this
> to also match on incoming traffic? Sorry to be completely ignorant here.
I just realized, that the "owner" extension is "only" matching on
UID/GID. For thing I would like to solve the match should be on PID.
IIRC the "owner" extension supported but this feature but it was removed [1]
thanks,
daniel
[1]
http://www.mail-archive.com/git-commits-head@vger.kernel.org/msg00486.html
^ permalink raw reply
* Re: [PATCH 1/2] igb: Allow extra 4 bytes on RX for vlan tags.
From: Ben Greear @ 2011-08-25 18:51 UTC (permalink / raw)
To: Alexander Duyck
Cc: jeffrey.t.kirsher, Jesse Gross, netdev@vger.kernel.org,
Duyck, Alexander H
In-Reply-To: <CAKgT0UfaEEvRTSpu-U+0_oj0KnEkyx5hRAwZDiCAAdtY4YhQUQ@mail.gmail.com>
On 07/20/2011 11:35 PM, Alexander Duyck wrote:
> On Wed, Jul 20, 2011 at 6:21 PM, Jeff Kirsher
> <jeffrey.t.kirsher@intel.com> wrote:
>> On Wed, 2011-07-20 at 17:27 -0700, Ben Greear wrote:
>>> On 07/20/2011 05:18 PM, Jesse Gross wrote:
>>>> On Thu, Feb 17, 2011 at 9:28 AM, Ben Greear<greearb@candelatech.com> wrote:
>>>>> On 02/17/2011 03:04 AM, Jeff Kirsher wrote:
>>>>>>
>>>>>> On Thu, Feb 10, 2011 at 13:59,<greearb@candelatech.com> wrote:
>>>>>>>
>>>>>>> From: Ben Greear<greearb@candelatech.com>
>>>>>>>
>>>>>>> This allows the NIC to receive 1518 byte (not counting
>>>>>>> FCS) packets when MTU is 1500, thus allowing 1500 MTU
>>>>>>> VLAN frames to be received. Please note that no VLANs
>>>>>>> were actually configured on the NIC...it was just acting
>>>>>>> as pass-through device.
>>>>>>>
>>>>>>> Signed-off-by: Ben Greear<greearb@candelatech.com>
>>>>>>> ---
>>>>>>> :100644 100644 58c665b... 30c9cc6... M drivers/net/igb/igb_main.c
>>>>>>> drivers/net/igb/igb_main.c | 5 +++--
>>>>>>> 1 files changed, 3 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c
>>>>>>> index 58c665b..30c9cc6 100644
>>>>>>> --- a/drivers/net/igb/igb_main.c
>>>>>>> +++ b/drivers/net/igb/igb_main.c
>>>>>>> @@ -2281,7 +2281,8 @@ static int __devinit igb_sw_init(struct igb_adapter
>>>>>>> *adapter)
>>>>>>> adapter->rx_itr_setting = IGB_DEFAULT_ITR;
>>>>>>> adapter->tx_itr_setting = IGB_DEFAULT_ITR;
>>>>>>>
>>>>>>> - adapter->max_frame_size = netdev->mtu + ETH_HLEN + ETH_FCS_LEN;
>>>>>>> + adapter->max_frame_size = (netdev->mtu + ETH_HLEN + ETH_FCS_LEN
>>>>>>> + + VLAN_HLEN);
>>>>>>> adapter->min_frame_size = ETH_ZLEN + ETH_FCS_LEN;
>>>>>>>
>>>>>>> spin_lock_init(&adapter->stats64_lock);
>>>>>>> @@ -4303,7 +4304,7 @@ static int igb_change_mtu(struct net_device
>>>>>>> *netdev, int new_mtu)
>>>>>>> {
>>>>>>> struct igb_adapter *adapter = netdev_priv(netdev);
>>>>>>> struct pci_dev *pdev = adapter->pdev;
>>>>>>> - int max_frame = new_mtu + ETH_HLEN + ETH_FCS_LEN;
>>>>>>> + int max_frame = new_mtu + ETH_HLEN + ETH_FCS_LEN + VLAN_HLEN;
>>>>>>> u32 rx_buffer_len, i;
>>>>>>>
>>>>>>> if ((new_mtu< 68) || (max_frame> MAX_JUMBO_FRAME_SIZE)) {
>>>>>>
>>>>>> While testing this patch, validation found that the patch reduces the
>>>>>> maximum mtu size
>>>>>> by 4 bytes (reduces it from 9216 to 9212). This is not a desired side
>>>>>> effect of this patch.
>>>>>
>>>>> You could add handling for that case and have it act as it used to when
>>>>> new_mtu is greater than 9212?
>>>>>
>>>>> I tested e1000e and it worked w/out hacking at 1500 MTU, so maybe
>>>>> check how it does it?
>>>>
>>>> I just wanted to bring this up again to see if any progress had been
>>>> made. We were looking at this driver and trying to figure out the
>>>> best way to convert it to use the new vlan model but I'm not familiar
>>>
>>> I've been watching :)
>>>
>>>> enough with the hardware to know. It seems that all of the other
>>>> Intel drivers unconditionally add space for the vlan tag to the
>>>> receive buffer (and would therefore have similar effects as this
>>>> patch), is there something different about this card?
>>>>
>>>> I believe that Alex was working on something in this area (in the
>>>> context of one of my patches from a long time ago) but I'm not sure
>>>> what came of that.
>>>
>>> Truth is, I don't really see why it's a problem to decrease the
>>> maximum MTU slightly in order to make it work with VLANs.
>>>
>>> I'm not sure if there is some way to make it work with VLANs
>>> and not decrease the maximum MTU.
>>
>> This was the reason this did not get accepted. I was looking into what
>> could be done so that we did not decease the maximum MTU, but I got
>> side-tracked and have not done anything on it in several months.
>>
>
> I can take a look at fixing this most likely tomorrow. I have some
> work planned for igb anyway over the next few days.
>
> Odds are it is just a matter of where the VLAN_HLEN is added. As I
> recall for our drivers the correct spot is in the setting of
> rx_buffer_len since that is the area more concerned with maximum
> receive frame size versus the mtu section which is more concerned with
> the transmit side of things.
Did a patch for this ever get posted? I'll be happy to test it
if so...
Thanks,
Ben
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
^ permalink raw reply
* Re: [RFT PATCH v3 00/12] Cleanup and extension of netdev features
From: Ben Greear @ 2011-08-25 19:04 UTC (permalink / raw)
To: Michał Mirosław; +Cc: netdev, David S. Miller, Ben Hutchings
In-Reply-To: <cover.1308758435.git.mirq-linux@rere.qmqm.pl>
On 06/22/2011 09:04 AM, Michał Mirosław wrote:
> v3 of a feature handling cleanup and extension series. For testing, you
> might want user-space ethtool patched with:
>
> http://patchwork.ozlabs.org/patch/96374/
It looks like this is not in net-next yet...any hope of this
going in soon?
Thanks,
Ben
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
^ permalink raw reply
* Re: [RFC PATCH v2 0/9] bql: Byte Queue Limits
From: jamal @ 2011-08-25 20:23 UTC (permalink / raw)
To: Tom Herbert; +Cc: Johannes Berg, davem, netdev
In-Reply-To: <CA+mtBx8mb0pAqVH19WK=e9LuLeyvnNTybROOURD2Rg6UtYe+bw@mail.gmail.com>
On Thu, 2011-08-25 at 08:29 -0700, Tom Herbert wrote:
> BQL is dynamic, and will increase the queue limit more aggressively
> than decrease it. So for instance, we can track the largest queue
> needed over 30 seconds which should be stable in the presence even in
> the presence of fluctuating bandwidth. The thing that worries me is
> rather the HW queues conform to the queue characteristics described in
> the patch. If transmit completions are random and not regular, BQL
> probably can't function well.
>
I think thats the challenge ;-> I wouldnt say it is random, but if my
understanding is correct the effect is a factor of number of stations
etc.
> If you'd like to bring this up on some wireless devices that would be
> great, I don't have easy access to any right now, but I can try to
> help otherwise.
I am most curious as well...
cheers,
jamal
^ permalink raw reply
* Re: Traffic shaping - class ID 16bit limit?
From: jamal @ 2011-08-25 20:26 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: Miroslav Kratochvil, netdev
In-Reply-To: <20110825093937.2a8a1457@nehalam.ftrdhcpuser.net>
On Thu, 2011-08-25 at 09:39 -0700, Stephen Hemminger wrote:
> Granted it was a poor choice in the initial design.
> It is wired into the API and changing it would be quite painful.
>
You should be able to have infinite queues if you use the
hierachies. i.e each hierachy should expose a new 16 bit
namespace.
cheers,
jamal
^ permalink raw reply
* [PATCH net-next 1/1] af_packet: Prefixed tpacket_v3 structs to avoid name space collision
From: Chetan Loke @ 2011-08-25 20:43 UTC (permalink / raw)
To: netdev, davem; +Cc: Chetan Loke
structs introduced in tpacket_v3 implementation are prefixed with 'tpacket'
to avoid namespace collision.
Compile tested.
Signed-off-by: Chetan Loke <loke.chetan@gmail.com>
---
include/linux/if_packet.h | 18 ++++----
net/packet/af_packet.c | 117 ++++++++++++++++++++++++---------------------
2 files changed, 71 insertions(+), 64 deletions(-)
diff --git a/include/linux/if_packet.h b/include/linux/if_packet.h
index 5926d59..5e76988 100644
--- a/include/linux/if_packet.h
+++ b/include/linux/if_packet.h
@@ -126,7 +126,7 @@ struct tpacket2_hdr {
__u16 tp_padding;
};
-struct hdr_variant1 {
+struct tpacket_hdr_variant1 {
__u32 tp_rxhash;
__u32 tp_vlan_tci;
};
@@ -142,11 +142,11 @@ struct tpacket3_hdr {
__u16 tp_net;
/* pkt_hdr variants */
union {
- struct hdr_variant1 hv1;
+ struct tpacket_hdr_variant1 hv1;
};
};
-struct bd_ts {
+struct tpacket_bd_ts {
unsigned int ts_sec;
union {
unsigned int ts_usec;
@@ -154,7 +154,7 @@ struct bd_ts {
};
};
-struct hdr_v1 {
+struct tpacket_hdr_v1 {
__u32 block_status;
__u32 num_pkts;
__u32 offset_to_first_pkt;
@@ -200,17 +200,17 @@ struct hdr_v1 {
* Use the ts of the first packet in the block.
*
*/
- struct bd_ts ts_first_pkt, ts_last_pkt;
+ struct tpacket_bd_ts ts_first_pkt, ts_last_pkt;
};
-union bd_header_u {
- struct hdr_v1 bh1;
+union tpacket_bd_header_u {
+ struct tpacket_hdr_v1 bh1;
};
-struct block_desc {
+struct tpacket_block_desc {
__u32 version;
__u32 offset_to_priv;
- union bd_header_u hdr;
+ union tpacket_bd_header_u hdr;
};
#define TPACKET2_HDRLEN (TPACKET_ALIGN(sizeof(struct tpacket2_hdr)) + sizeof(struct sockaddr_ll))
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 4371e3a..2ea3d63 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -171,13 +171,13 @@ static int packet_set_ring(struct sock *sk, union tpacket_req_u *req_u,
#define V3_ALIGNMENT (8)
-#define BLK_HDR_LEN (ALIGN(sizeof(struct block_desc), V3_ALIGNMENT))
+#define BLK_HDR_LEN (ALIGN(sizeof(struct tpacket_block_desc), V3_ALIGNMENT))
#define BLK_PLUS_PRIV(sz_of_priv) \
(BLK_HDR_LEN + ALIGN((sz_of_priv), V3_ALIGNMENT))
/* kbdq - kernel block descriptor queue */
-struct kbdq_core {
+struct tpacket_kbdq_core {
struct pgv *pkbdq;
unsigned int feature_req_word;
unsigned int hdrlen;
@@ -230,7 +230,7 @@ struct packet_ring_buffer {
unsigned int pg_vec_pages;
unsigned int pg_vec_len;
- struct kbdq_core prb_bdqc;
+ struct tpacket_kbdq_core prb_bdqc;
atomic_t pending;
};
@@ -249,21 +249,25 @@ static void *packet_previous_frame(struct packet_sock *po,
struct packet_ring_buffer *rb,
int status);
static void packet_increment_head(struct packet_ring_buffer *buff);
-static int prb_curr_blk_in_use(struct kbdq_core *,
- struct block_desc *);
-static void *prb_dispatch_next_block(struct kbdq_core *,
+static int prb_curr_blk_in_use(struct tpacket_kbdq_core *,
+ struct tpacket_block_desc *);
+static void *prb_dispatch_next_block(struct tpacket_kbdq_core *,
struct packet_sock *);
-static void prb_retire_current_block(struct kbdq_core *,
+static void prb_retire_current_block(struct tpacket_kbdq_core *,
struct packet_sock *, unsigned int status);
-static int prb_queue_frozen(struct kbdq_core *);
-static void prb_open_block(struct kbdq_core *, struct block_desc *);
+static int prb_queue_frozen(struct tpacket_kbdq_core *);
+static void prb_open_block(struct tpacket_kbdq_core *,
+ struct tpacket_block_desc *);
static void prb_retire_rx_blk_timer_expired(unsigned long);
-static void _prb_refresh_rx_retire_blk_timer(struct kbdq_core *);
-static void prb_init_blk_timer(struct packet_sock *, struct kbdq_core *,
- void (*func) (unsigned long));
-static void prb_fill_rxhash(struct kbdq_core *, struct tpacket3_hdr *);
-static void prb_clear_rxhash(struct kbdq_core *, struct tpacket3_hdr *);
-static void prb_fill_vlan_info(struct kbdq_core *, struct tpacket3_hdr *);
+static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *);
+static void prb_init_blk_timer(struct packet_sock *,
+ struct tpacket_kbdq_core *,
+ void (*func) (unsigned long));
+static void prb_fill_rxhash(struct tpacket_kbdq_core *, struct tpacket3_hdr *);
+static void prb_clear_rxhash(struct tpacket_kbdq_core *,
+ struct tpacket3_hdr *);
+static void prb_fill_vlan_info(struct tpacket_kbdq_core *,
+ struct tpacket3_hdr *);
static void packet_flush_mclist(struct sock *sk);
struct packet_fanout;
@@ -322,11 +326,11 @@ struct packet_skb_cb {
#define PACKET_SKB_CB(__skb) ((struct packet_skb_cb *)((__skb)->cb))
-#define GET_PBDQC_FROM_RB(x) ((struct kbdq_core *)(&(x)->prb_bdqc))
+#define GET_PBDQC_FROM_RB(x) ((struct tpacket_kbdq_core *)(&(x)->prb_bdqc))
#define GET_PBLOCK_DESC(x, bid) \
- ((struct block_desc *)((x)->pkbdq[(bid)].buffer))
+ ((struct tpacket_block_desc *)((x)->pkbdq[(bid)].buffer))
#define GET_CURR_PBLOCK_DESC_FROM_CORE(x) \
- ((struct block_desc *)((x)->pkbdq[(x)->kactive_blk_num].buffer))
+ ((struct tpacket_block_desc *)((x)->pkbdq[(x)->kactive_blk_num].buffer))
#define GET_NEXT_PRB_BLK_NUM(x) \
(((x)->kactive_blk_num < ((x)->knum_blocks-1)) ? \
((x)->kactive_blk_num+1) : 0)
@@ -480,7 +484,7 @@ static inline void *packet_current_frame(struct packet_sock *po,
return packet_lookup_frame(po, rb, rb->head, status);
}
-static void prb_del_retire_blk_timer(struct kbdq_core *pkc)
+static void prb_del_retire_blk_timer(struct tpacket_kbdq_core *pkc)
{
del_timer_sync(&pkc->retire_blk_timer);
}
@@ -489,7 +493,7 @@ static void prb_shutdown_retire_blk_timer(struct packet_sock *po,
int tx_ring,
struct sk_buff_head *rb_queue)
{
- struct kbdq_core *pkc;
+ struct tpacket_kbdq_core *pkc;
pkc = tx_ring ? &po->tx_ring.prb_bdqc : &po->rx_ring.prb_bdqc;
@@ -501,7 +505,7 @@ static void prb_shutdown_retire_blk_timer(struct packet_sock *po,
}
static void prb_init_blk_timer(struct packet_sock *po,
- struct kbdq_core *pkc,
+ struct tpacket_kbdq_core *pkc,
void (*func) (unsigned long))
{
init_timer(&pkc->retire_blk_timer);
@@ -512,7 +516,7 @@ static void prb_init_blk_timer(struct packet_sock *po,
static void prb_setup_retire_blk_timer(struct packet_sock *po, int tx_ring)
{
- struct kbdq_core *pkc;
+ struct tpacket_kbdq_core *pkc;
if (tx_ring)
BUG();
@@ -568,7 +572,7 @@ static int prb_calc_retire_blk_tmo(struct packet_sock *po,
return tmo;
}
-static void prb_init_ft_ops(struct kbdq_core *p1,
+static void prb_init_ft_ops(struct tpacket_kbdq_core *p1,
union tpacket_req_u *req_u)
{
p1->feature_req_word = req_u->req3.tp_feature_req_word;
@@ -579,14 +583,14 @@ static void init_prb_bdqc(struct packet_sock *po,
struct pgv *pg_vec,
union tpacket_req_u *req_u, int tx_ring)
{
- struct kbdq_core *p1 = &rb->prb_bdqc;
- struct block_desc *pbd;
+ struct tpacket_kbdq_core *p1 = &rb->prb_bdqc;
+ struct tpacket_block_desc *pbd;
memset(p1, 0x0, sizeof(*p1));
p1->knxt_seq_num = 1;
p1->pkbdq = pg_vec;
- pbd = (struct block_desc *)pg_vec[0].buffer;
+ pbd = (struct tpacket_block_desc *)pg_vec[0].buffer;
p1->pkblk_start = (char *)pg_vec[0].buffer;
p1->kblk_size = req_u->req3.tp_block_size;
p1->knum_blocks = req_u->req3.tp_block_nr;
@@ -610,7 +614,7 @@ static void init_prb_bdqc(struct packet_sock *po,
/* Do NOT update the last_blk_num first.
* Assumes sk_buff_head lock is held.
*/
-static void _prb_refresh_rx_retire_blk_timer(struct kbdq_core *pkc)
+static void _prb_refresh_rx_retire_blk_timer(struct tpacket_kbdq_core *pkc)
{
mod_timer(&pkc->retire_blk_timer,
jiffies + pkc->tov_in_jiffies);
@@ -643,9 +647,9 @@ static void _prb_refresh_rx_retire_blk_timer(struct kbdq_core *pkc)
static void prb_retire_rx_blk_timer_expired(unsigned long data)
{
struct packet_sock *po = (struct packet_sock *)data;
- struct kbdq_core *pkc = &po->rx_ring.prb_bdqc;
+ struct tpacket_kbdq_core *pkc = &po->rx_ring.prb_bdqc;
unsigned int frozen;
- struct block_desc *pbd;
+ struct tpacket_block_desc *pbd;
spin_lock(&po->sk.sk_receive_queue.lock);
@@ -709,8 +713,8 @@ out:
spin_unlock(&po->sk.sk_receive_queue.lock);
}
-static inline void prb_flush_block(struct kbdq_core *pkc1,
- struct block_desc *pbd1, __u32 status)
+static inline void prb_flush_block(struct tpacket_kbdq_core *pkc1,
+ struct tpacket_block_desc *pbd1, __u32 status)
{
/* Flush everything minus the block header */
@@ -752,13 +756,14 @@ static inline void prb_flush_block(struct kbdq_core *pkc1,
* Note:We DONT refresh the timer on purpose.
* Because almost always the next block will be opened.
*/
-static void prb_close_block(struct kbdq_core *pkc1, struct block_desc *pbd1,
+static void prb_close_block(struct tpacket_kbdq_core *pkc1,
+ struct tpacket_block_desc *pbd1,
struct packet_sock *po, unsigned int stat)
{
__u32 status = TP_STATUS_USER | stat;
struct tpacket3_hdr *last_pkt;
- struct hdr_v1 *h1 = &pbd1->hdr.bh1;
+ struct tpacket_hdr_v1 *h1 = &pbd1->hdr.bh1;
if (po->stats.tp_drops)
status |= TP_STATUS_LOSING;
@@ -786,7 +791,7 @@ static void prb_close_block(struct kbdq_core *pkc1, struct block_desc *pbd1,
pkc1->kactive_blk_num = GET_NEXT_PRB_BLK_NUM(pkc1);
}
-static inline void prb_thaw_queue(struct kbdq_core *pkc)
+static inline void prb_thaw_queue(struct tpacket_kbdq_core *pkc)
{
pkc->reset_pending_on_curr_blk = 0;
}
@@ -798,10 +803,11 @@ static inline void prb_thaw_queue(struct kbdq_core *pkc)
* 2) retire_blk_timer is refreshed.
*
*/
-static void prb_open_block(struct kbdq_core *pkc1, struct block_desc *pbd1)
+static void prb_open_block(struct tpacket_kbdq_core *pkc1,
+ struct tpacket_block_desc *pbd1)
{
struct timespec ts;
- struct hdr_v1 *h1 = &pbd1->hdr.bh1;
+ struct tpacket_hdr_v1 *h1 = &pbd1->hdr.bh1;
smp_rmb();
@@ -861,7 +867,7 @@ static void prb_open_block(struct kbdq_core *pkc1, struct block_desc *pbd1)
* case and __packet_lookup_frame_in_block will check if block-0
* is free and can now be re-used.
*/
-static inline void prb_freeze_queue(struct kbdq_core *pkc,
+static inline void prb_freeze_queue(struct tpacket_kbdq_core *pkc,
struct packet_sock *po)
{
pkc->reset_pending_on_curr_blk = 1;
@@ -876,10 +882,10 @@ static inline void prb_freeze_queue(struct kbdq_core *pkc,
* Else, we will freeze the queue.
* So, caller must check the return value.
*/
-static void *prb_dispatch_next_block(struct kbdq_core *pkc,
+static void *prb_dispatch_next_block(struct tpacket_kbdq_core *pkc,
struct packet_sock *po)
{
- struct block_desc *pbd;
+ struct tpacket_block_desc *pbd;
smp_rmb();
@@ -901,10 +907,10 @@ static void *prb_dispatch_next_block(struct kbdq_core *pkc,
return (void *)pkc->nxt_offset;
}
-static void prb_retire_current_block(struct kbdq_core *pkc,
+static void prb_retire_current_block(struct tpacket_kbdq_core *pkc,
struct packet_sock *po, unsigned int status)
{
- struct block_desc *pbd = GET_CURR_PBLOCK_DESC_FROM_CORE(pkc);
+ struct tpacket_block_desc *pbd = GET_CURR_PBLOCK_DESC_FROM_CORE(pkc);
/* retire/close the current block */
if (likely(TP_STATUS_KERNEL == BLOCK_STATUS(pbd))) {
@@ -932,36 +938,36 @@ static void prb_retire_current_block(struct kbdq_core *pkc,
BUG();
}
-static inline int prb_curr_blk_in_use(struct kbdq_core *pkc,
- struct block_desc *pbd)
+static inline int prb_curr_blk_in_use(struct tpacket_kbdq_core *pkc,
+ struct tpacket_block_desc *pbd)
{
return TP_STATUS_USER & BLOCK_STATUS(pbd);
}
-static inline int prb_queue_frozen(struct kbdq_core *pkc)
+static inline int prb_queue_frozen(struct tpacket_kbdq_core *pkc)
{
return pkc->reset_pending_on_curr_blk;
}
static inline void prb_clear_blk_fill_status(struct packet_ring_buffer *rb)
{
- struct kbdq_core *pkc = GET_PBDQC_FROM_RB(rb);
+ struct tpacket_kbdq_core *pkc = GET_PBDQC_FROM_RB(rb);
atomic_dec(&pkc->blk_fill_in_prog);
}
-static inline void prb_fill_rxhash(struct kbdq_core *pkc,
+static inline void prb_fill_rxhash(struct tpacket_kbdq_core *pkc,
struct tpacket3_hdr *ppd)
{
ppd->hv1.tp_rxhash = skb_get_rxhash(pkc->skb);
}
-static inline void prb_clear_rxhash(struct kbdq_core *pkc,
+static inline void prb_clear_rxhash(struct tpacket_kbdq_core *pkc,
struct tpacket3_hdr *ppd)
{
ppd->hv1.tp_rxhash = 0;
}
-static inline void prb_fill_vlan_info(struct kbdq_core *pkc,
+static inline void prb_fill_vlan_info(struct tpacket_kbdq_core *pkc,
struct tpacket3_hdr *ppd)
{
if (vlan_tx_tag_present(pkc->skb)) {
@@ -972,7 +978,7 @@ static inline void prb_fill_vlan_info(struct kbdq_core *pkc,
}
}
-static void prb_run_all_ft_ops(struct kbdq_core *pkc,
+static void prb_run_all_ft_ops(struct tpacket_kbdq_core *pkc,
struct tpacket3_hdr *ppd)
{
prb_fill_vlan_info(pkc, ppd);
@@ -983,8 +989,9 @@ static void prb_run_all_ft_ops(struct kbdq_core *pkc,
prb_clear_rxhash(pkc, ppd);
}
-static inline void prb_fill_curr_block(char *curr, struct kbdq_core *pkc,
- struct block_desc *pbd,
+static inline void prb_fill_curr_block(char *curr,
+ struct tpacket_kbdq_core *pkc,
+ struct tpacket_block_desc *pbd,
unsigned int len)
{
struct tpacket3_hdr *ppd;
@@ -1006,8 +1013,8 @@ static void *__packet_lookup_frame_in_block(struct packet_sock *po,
unsigned int len
)
{
- struct kbdq_core *pkc;
- struct block_desc *pbd;
+ struct tpacket_kbdq_core *pkc;
+ struct tpacket_block_desc *pbd;
char *curr, *end;
pkc = GET_PBDQC_FROM_RB(((struct packet_ring_buffer *)&po->rx_ring));
@@ -1087,8 +1094,8 @@ static inline void *prb_lookup_block(struct packet_sock *po,
unsigned int previous,
int status)
{
- struct kbdq_core *pkc = GET_PBDQC_FROM_RB(rb);
- struct block_desc *pbd = GET_PBLOCK_DESC(pkc, previous);
+ struct tpacket_kbdq_core *pkc = GET_PBDQC_FROM_RB(rb);
+ struct tpacket_block_desc *pbd = GET_PBLOCK_DESC(pkc, previous);
if (status != BLOCK_STATUS(pbd))
return NULL;
--
1.7.5.2
^ permalink raw reply related
* RFC: Remove unnecessary / duplicate OOM printks
From: Joe Perches @ 2011-08-25 20:47 UTC (permalink / raw)
To: LKML; +Cc: Eric Dumazet, netdev
There are many thousands of printks for OOM conditions
in kernel sources.
These are almost always a duplication of a generic
OOM message from the mm subsystem.
The biggest difference between the generic OOM and
the specific OOM uses is that most of the specific
messages are emitted at KERN_ERR but the generic
message is at KERN_WARNING.
Many KB of code/text could be removed from the kernel.
Removal can be gradual and done by subsystem.
Some kmalloc's that are followed on failure by vmalloc
may need to add GFP_NOWARN.
Does anyone really believe the per site failure
messages are useful or really want them to keep them?
^ permalink raw reply
* is SO_PRIORITY still supposed to affect the TOS field?
From: Rick Jones @ 2011-08-25 21:52 UTC (permalink / raw)
To: netdev-u79uwXL29TY76Z2rM5mHXA
Cc: dave.taht-Re5JQEeQqe8AvxtiuMwx3w,
amirv-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb,
mkt.manpages-Re5JQEeQqe8AvxtiuMwx3w,
linux-man-u79uwXL29TY76Z2rM5mHXA
Amir Vadai recently sent me some patches for netperf to implement
setting SO_PRIORITY. I made changes to netperf based on those patches,
(now in top of trunk on netperf.org) and between some emails with me,
him and Dave Taht and some quick messing around while watching tcpdump,
I'm wondering if this:
> SO_PRIORITY
> Set the protocol-defined priority for all packets to be sent on
> this socket. Linux uses this value to order the networking
> queues: packets with a higher priority may be processed first
> depending on the selected device queueing discipline. For
> ip(7), this also sets the IP type-of-service (TOS) field for
> outgoing packets. Setting a priority outside the range 0 to 6
> requires the CAP_NET_ADMIN capability.
from the socket(7) manpage is still accurate because even setting a
number of different values for SO_PRIORITY I didn't see the TOS field as
anything other than 0 in tcpdump output (confirming something Dave Taht
saw).
happy benchmarking,
rick jones
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: RFC: Remove unnecessary / duplicate OOM printks
From: Andrew Morton @ 2011-08-25 22:04 UTC (permalink / raw)
To: Joe Perches; +Cc: LKML, Eric Dumazet, netdev
In-Reply-To: <1314305243.4637.10.camel@Joe-Laptop>
On Thu, 25 Aug 2011 13:47:23 -0700
Joe Perches <joe@perches.com> wrote:
> There are many thousands of printks for OOM conditions
> in kernel sources.
>
> These are almost always a duplication of a generic
> OOM message from the mm subsystem.
>
> The biggest difference between the generic OOM and
> the specific OOM uses is that most of the specific
> messages are emitted at KERN_ERR but the generic
> message is at KERN_WARNING.
>
> Many KB of code/text could be removed from the kernel.
>
> Removal can be gradual and done by subsystem.
>
> Some kmalloc's that are followed on failure by vmalloc
> may need to add GFP_NOWARN.
>
> Does anyone really believe the per site failure
> messages are useful or really want them to keep them?
>
Kill 'em.
^ permalink raw reply
* [RFC PATCH] caif: Remove OOM messages, use kzalloc
From: Joe Perches @ 2011-08-25 23:22 UTC (permalink / raw)
To: Andrew Morton, Sjur Braendeland
Cc: LKML, Eric Dumazet, netdev, David S. Miller
In-Reply-To: <20110825150456.fc8720c4.akpm@linux-foundation.org>
Remove per site OOM messages because they duplicate
the generic mm subsystem OOM message.
Use kzalloc instead of kmalloc/memset
when next to the OOM message removals.
Reduces object size (allyesconfig ~2%)
$ size -t drivers/net/caif/built-in.o.old net/caif/built-in.o.old
text data bss dec hex filename
32297 700 8224 41221 a105 drivers/net/caif/built-in.o.old
72159 1317 20552 94028 16f4c net/caif/built-in.o.old
104456 2017 28776 135249 21051 (TOTALS)
$ size -t drivers/net/caif/built-in.o.new net/caif/built-in.o.new
text data bss dec hex filename
31975 700 8184 40859 9f9b drivers/net/caif/built-in.o.new
70748 1317 20152 92217 16839 net/caif/built-in.o.new
102723 2017 28336 133076 207d4 (TOTALS)
Signed-off-by: Joe Perches <joe@perches.com>
---
This is a sample of OOM message removal.
On Thu, 2011-08-25 at 15:04 -0700, Andrew Morton wrote:
> On Thu, 25 Aug 2011 13:47:23 -0700 Joe Perches <joe@perches.com> wrote:
> > There are many thousands of printks for OOM conditions
> > in kernel sources.
> > These are almost always a duplication of a generic
> > OOM message from the mm subsystem.
> > The biggest difference between the generic OOM and
> > the specific OOM uses is that most of the specific
> > messages are emitted at KERN_ERR but the generic
> > message is at KERN_WARNING.
> > Many KB of code/text could be removed from the kernel.
> > Removal can be gradual and done by subsystem.
> > Some kmalloc's that are followed on failure by vmalloc
> > may need to add GFP_NOWARN.
> > Does anyone really believe the per site failure
> > messages are useful or really want them to keep them?
> Kill 'em.
drivers/net/caif/caif_hsi.c | 9 +--------
drivers/net/caif/caif_spi.c | 4 ----
net/caif/cfcnfg.c | 15 +++------------
net/caif/cfctrl.c | 23 ++++++-----------------
net/caif/cfdbgl.c | 7 ++-----
net/caif/cfdgml.c | 7 ++-----
net/caif/cffrml.c | 7 ++-----
net/caif/cfrfml.c | 7 ++-----
net/caif/cfserl.c | 7 ++-----
net/caif/cfsrvl.c | 8 ++------
net/caif/cfutill.c | 7 ++-----
net/caif/cfveil.c | 7 ++-----
net/caif/cfvidl.c | 7 ++-----
13 files changed, 28 insertions(+), 87 deletions(-)
diff --git a/drivers/net/caif/caif_hsi.c b/drivers/net/caif/caif_hsi.c
index b41c2fc..2fcabba 100644
--- a/drivers/net/caif/caif_hsi.c
+++ b/drivers/net/caif/caif_hsi.c
@@ -937,11 +937,8 @@ int cfhsi_probe(struct platform_device *pdev)
int res;
ndev = alloc_netdev(sizeof(struct cfhsi), "cfhsi%d", cfhsi_setup);
- if (!ndev) {
- dev_err(&pdev->dev, "%s: alloc_netdev failed.\n",
- __func__);
+ if (!ndev)
return -ENODEV;
- }
cfhsi = netdev_priv(ndev);
cfhsi->ndev = ndev;
@@ -969,8 +966,6 @@ int cfhsi_probe(struct platform_device *pdev)
*/
cfhsi->tx_buf = kzalloc(CFHSI_BUF_SZ_TX, GFP_KERNEL);
if (!cfhsi->tx_buf) {
- dev_err(&ndev->dev, "%s: Failed to allocate TX buffer.\n",
- __func__);
res = -ENODEV;
goto err_alloc_tx;
}
@@ -981,8 +976,6 @@ int cfhsi_probe(struct platform_device *pdev)
*/
cfhsi->rx_buf = kzalloc(CFHSI_BUF_SZ_RX, GFP_KERNEL);
if (!cfhsi->rx_buf) {
- dev_err(&ndev->dev, "%s: Failed to allocate RX buffer.\n",
- __func__);
res = -ENODEV;
goto err_alloc_rx;
}
diff --git a/drivers/net/caif/caif_spi.c b/drivers/net/caif/caif_spi.c
index 0f8defc..05e791f 100644
--- a/drivers/net/caif/caif_spi.c
+++ b/drivers/net/caif/caif_spi.c
@@ -664,8 +664,6 @@ int cfspi_spi_probe(struct platform_device *pdev)
/* Allocate DMA buffers. */
cfspi->xfer.va_tx = dma_alloc(&cfspi->xfer.pa_tx);
if (!cfspi->xfer.va_tx) {
- printk(KERN_WARNING
- "CFSPI: failed to allocate dma TX buffer.\n");
res = -ENODEV;
goto err_dma_alloc_tx;
}
@@ -673,8 +671,6 @@ int cfspi_spi_probe(struct platform_device *pdev)
cfspi->xfer.va_rx = dma_alloc(&cfspi->xfer.pa_rx);
if (!cfspi->xfer.va_rx) {
- printk(KERN_WARNING
- "CFSPI: failed to allocate dma TX buffer.\n");
res = -ENODEV;
goto err_dma_alloc_rx;
}
diff --git a/net/caif/cfcnfg.c b/net/caif/cfcnfg.c
index 52fe33b..f07ab8c 100644
--- a/net/caif/cfcnfg.c
+++ b/net/caif/cfcnfg.c
@@ -78,10 +78,8 @@ struct cfcnfg *cfcnfg_create(void)
/* Initiate this layer */
this = kzalloc(sizeof(struct cfcnfg), GFP_ATOMIC);
- if (!this) {
- pr_warn("Out of memory\n");
+ if (!this)
return NULL;
- }
this->mux = cfmuxl_create();
if (!this->mux)
goto out_of_mem;
@@ -108,8 +106,6 @@ struct cfcnfg *cfcnfg_create(void)
return this;
out_of_mem:
- pr_warn("Out of memory\n");
-
synchronize_rcu();
kfree(this->mux);
@@ -448,10 +444,8 @@ cfcnfg_linkup_rsp(struct cflayer *layer, u8 channel_id, enum cfctrl_srv serv,
"- unknown channel type\n");
goto unlock;
}
- if (!servicel) {
- pr_warn("Out of memory\n");
+ if (!servicel)
goto unlock;
- }
layer_set_dn(servicel, cnfg->mux);
cfmuxl_set_uplayer(cnfg->mux, servicel, channel_id);
layer_set_up(servicel, adapt_layer);
@@ -497,10 +491,8 @@ got_phyid:
case CFPHYTYPE_FRAG:
phy_driver =
cfserl_create(CFPHYTYPE_FRAG, phyid, stx);
- if (!phy_driver) {
- pr_warn("Out of memory\n");
+ if (!phy_driver)
goto out;
- }
break;
case CFPHYTYPE_CAIF:
phy_driver = NULL;
@@ -521,7 +513,6 @@ got_phyid:
frml = cffrml_create(phyid, fcs);
if (!frml) {
- pr_warn("Out of memory\n");
kfree(phyinfo);
goto out;
}
diff --git a/net/caif/cfctrl.c b/net/caif/cfctrl.c
index e22671b..5cf5222 100644
--- a/net/caif/cfctrl.c
+++ b/net/caif/cfctrl.c
@@ -35,15 +35,12 @@ struct cflayer *cfctrl_create(void)
{
struct dev_info dev_info;
struct cfctrl *this =
- kmalloc(sizeof(struct cfctrl), GFP_ATOMIC);
- if (!this) {
- pr_warn("Out of memory\n");
+ kzalloc(sizeof(struct cfctrl), GFP_ATOMIC);
+ if (!this)
return NULL;
- }
caif_assert(offsetof(struct cfctrl, serv.layer) == 0);
memset(&dev_info, 0, sizeof(dev_info));
dev_info.id = 0xff;
- memset(this, 0, sizeof(*this));
cfsrvl_init(&this->serv, 0, &dev_info, false);
atomic_set(&this->req_seq_no, 1);
atomic_set(&this->rsp_seq_no, 1);
@@ -180,10 +177,8 @@ void cfctrl_enum_req(struct cflayer *layer, u8 physlinkid)
struct cfctrl *cfctrl = container_obj(layer);
struct cfpkt *pkt = cfpkt_create(CFPKT_CTRL_PKT_LEN);
struct cflayer *dn = cfctrl->serv.layer.dn;
- if (!pkt) {
- pr_warn("Out of memory\n");
+ if (!pkt)
return;
- }
if (!dn) {
pr_debug("not able to send enum request\n");
return;
@@ -224,10 +219,8 @@ int cfctrl_linkup_request(struct cflayer *layer,
}
pkt = cfpkt_create(CFPKT_CTRL_PKT_LEN);
- if (!pkt) {
- pr_warn("Out of memory\n");
+ if (!pkt)
return -ENOMEM;
- }
cfpkt_addbdy(pkt, CFCTRL_CMD_LINK_SETUP);
cfpkt_addbdy(pkt, (param->chtype << 4) | param->linktype);
cfpkt_addbdy(pkt, (param->priority << 3) | param->phyid);
@@ -275,10 +268,8 @@ int cfctrl_linkup_request(struct cflayer *layer,
return -EINVAL;
}
req = kzalloc(sizeof(*req), GFP_KERNEL);
- if (!req) {
- pr_warn("Out of memory\n");
+ if (!req)
return -ENOMEM;
- }
req->client_layer = user_layer;
req->cmd = CFCTRL_CMD_LINK_SETUP;
req->param = *param;
@@ -312,10 +303,8 @@ int cfctrl_linkdown_req(struct cflayer *layer, u8 channelid,
struct cfpkt *pkt = cfpkt_create(CFPKT_CTRL_PKT_LEN);
struct cflayer *dn = cfctrl->serv.layer.dn;
- if (!pkt) {
- pr_warn("Out of memory\n");
+ if (!pkt)
return -ENOMEM;
- }
if (!dn) {
pr_debug("not able to send link-down request\n");
diff --git a/net/caif/cfdbgl.c b/net/caif/cfdbgl.c
index 11a2af4..65d6ef3 100644
--- a/net/caif/cfdbgl.c
+++ b/net/caif/cfdbgl.c
@@ -19,13 +19,10 @@ static int cfdbgl_transmit(struct cflayer *layr, struct cfpkt *pkt);
struct cflayer *cfdbgl_create(u8 channel_id, struct dev_info *dev_info)
{
- struct cfsrvl *dbg = kmalloc(sizeof(struct cfsrvl), GFP_ATOMIC);
- if (!dbg) {
- pr_warn("Out of memory\n");
+ struct cfsrvl *dbg = kzalloc(sizeof(struct cfsrvl), GFP_ATOMIC);
+ if (!dbg)
return NULL;
- }
caif_assert(offsetof(struct cfsrvl, layer) == 0);
- memset(dbg, 0, sizeof(struct cfsrvl));
cfsrvl_init(dbg, channel_id, dev_info, false);
dbg->layer.receive = cfdbgl_receive;
dbg->layer.transmit = cfdbgl_transmit;
diff --git a/net/caif/cfdgml.c b/net/caif/cfdgml.c
index 0382dec..0f5ff27 100644
--- a/net/caif/cfdgml.c
+++ b/net/caif/cfdgml.c
@@ -26,13 +26,10 @@ static int cfdgml_transmit(struct cflayer *layr, struct cfpkt *pkt);
struct cflayer *cfdgml_create(u8 channel_id, struct dev_info *dev_info)
{
- struct cfsrvl *dgm = kmalloc(sizeof(struct cfsrvl), GFP_ATOMIC);
- if (!dgm) {
- pr_warn("Out of memory\n");
+ struct cfsrvl *dgm = kzalloc(sizeof(struct cfsrvl), GFP_ATOMIC);
+ if (!dgm)
return NULL;
- }
caif_assert(offsetof(struct cfsrvl, layer) == 0);
- memset(dgm, 0, sizeof(struct cfsrvl));
cfsrvl_init(dgm, channel_id, dev_info, true);
dgm->layer.receive = cfdgml_receive;
dgm->layer.transmit = cfdgml_transmit;
diff --git a/net/caif/cffrml.c b/net/caif/cffrml.c
index 04204b2..f399211 100644
--- a/net/caif/cffrml.c
+++ b/net/caif/cffrml.c
@@ -34,11 +34,9 @@ static u32 cffrml_rcv_error;
static u32 cffrml_rcv_checsum_error;
struct cflayer *cffrml_create(u16 phyid, bool use_fcs)
{
- struct cffrml *this = kmalloc(sizeof(struct cffrml), GFP_ATOMIC);
- if (!this) {
- pr_warn("Out of memory\n");
+ struct cffrml *this = kzalloc(sizeof(struct cffrml), GFP_ATOMIC);
+ if (!this)
return NULL;
- }
this->pcpu_refcnt = alloc_percpu(int);
if (this->pcpu_refcnt == NULL) {
kfree(this);
@@ -47,7 +45,6 @@ struct cflayer *cffrml_create(u16 phyid, bool use_fcs)
caif_assert(offsetof(struct cffrml, layer) == 0);
- memset(this, 0, sizeof(struct cflayer));
this->layer.receive = cffrml_receive;
this->layer.transmit = cffrml_transmit;
this->layer.ctrlcmd = cffrml_ctrlcmd;
diff --git a/net/caif/cfrfml.c b/net/caif/cfrfml.c
index 0deabb4..81660f8 100644
--- a/net/caif/cfrfml.c
+++ b/net/caif/cfrfml.c
@@ -46,13 +46,10 @@ struct cflayer *cfrfml_create(u8 channel_id, struct dev_info *dev_info,
int mtu_size)
{
int tmp;
- struct cfrfml *this =
- kzalloc(sizeof(struct cfrfml), GFP_ATOMIC);
+ struct cfrfml *this = kzalloc(sizeof(struct cfrfml), GFP_ATOMIC);
- if (!this) {
- pr_warn("Out of memory\n");
+ if (!this)
return NULL;
- }
cfsrvl_init(&this->serv, channel_id, dev_info, false);
this->serv.release = cfrfml_release;
diff --git a/net/caif/cfserl.c b/net/caif/cfserl.c
index 2715c84..797c8d1 100644
--- a/net/caif/cfserl.c
+++ b/net/caif/cfserl.c
@@ -33,13 +33,10 @@ static void cfserl_ctrlcmd(struct cflayer *layr, enum caif_ctrlcmd ctrl,
struct cflayer *cfserl_create(int type, int instance, bool use_stx)
{
- struct cfserl *this = kmalloc(sizeof(struct cfserl), GFP_ATOMIC);
- if (!this) {
- pr_warn("Out of memory\n");
+ struct cfserl *this = kzalloc(sizeof(struct cfserl), GFP_ATOMIC);
+ if (!this)
return NULL;
- }
caif_assert(offsetof(struct cfserl, layer) == 0);
- memset(this, 0, sizeof(struct cfserl));
this->layer.receive = cfserl_receive;
this->layer.transmit = cfserl_transmit;
this->layer.ctrlcmd = cfserl_ctrlcmd;
diff --git a/net/caif/cfsrvl.c b/net/caif/cfsrvl.c
index 535a1e7..b99f5b2 100644
--- a/net/caif/cfsrvl.c
+++ b/net/caif/cfsrvl.c
@@ -108,10 +108,8 @@ static int cfservl_modemcmd(struct cflayer *layr, enum caif_modemcmd ctrl)
struct caif_payload_info *info;
u8 flow_on = SRVL_FLOW_ON;
pkt = cfpkt_create(SRVL_CTRL_PKT_SIZE);
- if (!pkt) {
- pr_warn("Out of memory\n");
+ if (!pkt)
return -ENOMEM;
- }
if (cfpkt_add_head(pkt, &flow_on, 1) < 0) {
pr_err("Packet is erroneous!\n");
@@ -130,10 +128,8 @@ static int cfservl_modemcmd(struct cflayer *layr, enum caif_modemcmd ctrl)
struct caif_payload_info *info;
u8 flow_off = SRVL_FLOW_OFF;
pkt = cfpkt_create(SRVL_CTRL_PKT_SIZE);
- if (!pkt) {
- pr_warn("Out of memory\n");
+ if (!pkt)
return -ENOMEM;
- }
if (cfpkt_add_head(pkt, &flow_off, 1) < 0) {
pr_err("Packet is erroneous!\n");
diff --git a/net/caif/cfutill.c b/net/caif/cfutill.c
index 98e027d..53e49f3 100644
--- a/net/caif/cfutill.c
+++ b/net/caif/cfutill.c
@@ -26,13 +26,10 @@ static int cfutill_transmit(struct cflayer *layr, struct cfpkt *pkt);
struct cflayer *cfutill_create(u8 channel_id, struct dev_info *dev_info)
{
- struct cfsrvl *util = kmalloc(sizeof(struct cfsrvl), GFP_ATOMIC);
- if (!util) {
- pr_warn("Out of memory\n");
+ struct cfsrvl *util = kzalloc(sizeof(struct cfsrvl), GFP_ATOMIC);
+ if (!util)
return NULL;
- }
caif_assert(offsetof(struct cfsrvl, layer) == 0);
- memset(util, 0, sizeof(struct cfsrvl));
cfsrvl_init(util, channel_id, dev_info, true);
util->layer.receive = cfutill_receive;
util->layer.transmit = cfutill_transmit;
diff --git a/net/caif/cfveil.c b/net/caif/cfveil.c
index 3ec83fb..910ab06 100644
--- a/net/caif/cfveil.c
+++ b/net/caif/cfveil.c
@@ -25,13 +25,10 @@ static int cfvei_transmit(struct cflayer *layr, struct cfpkt *pkt);
struct cflayer *cfvei_create(u8 channel_id, struct dev_info *dev_info)
{
- struct cfsrvl *vei = kmalloc(sizeof(struct cfsrvl), GFP_ATOMIC);
- if (!vei) {
- pr_warn("Out of memory\n");
+ struct cfsrvl *vei = kzalloc(sizeof(struct cfsrvl), GFP_ATOMIC);
+ if (!vei)
return NULL;
- }
caif_assert(offsetof(struct cfsrvl, layer) == 0);
- memset(vei, 0, sizeof(struct cfsrvl));
cfsrvl_init(vei, channel_id, dev_info, true);
vei->layer.receive = cfvei_receive;
vei->layer.transmit = cfvei_transmit;
diff --git a/net/caif/cfvidl.c b/net/caif/cfvidl.c
index b2f5989..e3f37db 100644
--- a/net/caif/cfvidl.c
+++ b/net/caif/cfvidl.c
@@ -21,14 +21,11 @@ static int cfvidl_transmit(struct cflayer *layr, struct cfpkt *pkt);
struct cflayer *cfvidl_create(u8 channel_id, struct dev_info *dev_info)
{
- struct cfsrvl *vid = kmalloc(sizeof(struct cfsrvl), GFP_ATOMIC);
- if (!vid) {
- pr_warn("Out of memory\n");
+ struct cfsrvl *vid = kzalloc(sizeof(struct cfsrvl), GFP_ATOMIC);
+ if (!vid)
return NULL;
- }
caif_assert(offsetof(struct cfsrvl, layer) == 0);
- memset(vid, 0, sizeof(struct cfsrvl));
cfsrvl_init(vid, channel_id, dev_info, false);
vid->layer.receive = cfvidl_receive;
vid->layer.transmit = cfvidl_transmit;
--
1.7.6.405.gc1be0
^ permalink raw reply related
* Re: [PATCH 1/2] igb: Allow extra 4 bytes on RX for vlan tags.
From: Alexander Duyck @ 2011-08-25 23:31 UTC (permalink / raw)
To: Ben Greear
Cc: Alexander Duyck, jeffrey.t.kirsher, Jesse Gross,
netdev@vger.kernel.org
In-Reply-To: <4E569999.8050006@candelatech.com>
On 08/25/2011 11:51 AM, Ben Greear wrote:
> On 07/20/2011 11:35 PM, Alexander Duyck wrote:
>> On Wed, Jul 20, 2011 at 6:21 PM, Jeff Kirsher
>> <jeffrey.t.kirsher@intel.com> wrote:
>>> On Wed, 2011-07-20 at 17:27 -0700, Ben Greear wrote:
>>>> On 07/20/2011 05:18 PM, Jesse Gross wrote:
>>>>> On Thu, Feb 17, 2011 at 9:28 AM, Ben
>>>>> Greear<greearb@candelatech.com> wrote:
>>>>>> On 02/17/2011 03:04 AM, Jeff Kirsher wrote:
>>>>>>>
>>>>>>> On Thu, Feb 10, 2011 at 13:59,<greearb@candelatech.com> wrote:
>>>>>>>>
>>>>>>>> From: Ben Greear<greearb@candelatech.com>
>>>>>>>>
>>>>>>>> This allows the NIC to receive 1518 byte (not counting
>>>>>>>> FCS) packets when MTU is 1500, thus allowing 1500 MTU
>>>>>>>> VLAN frames to be received. Please note that no VLANs
>>>>>>>> were actually configured on the NIC...it was just acting
>>>>>>>> as pass-through device.
>>>>>>>>
>>>>>>>> Signed-off-by: Ben Greear<greearb@candelatech.com>
>>>>>>>> ---
>>>>>>>> :100644 100644 58c665b... 30c9cc6... M drivers/net/igb/igb_main.c
>>>>>>>> drivers/net/igb/igb_main.c | 5 +++--
>>>>>>>> 1 files changed, 3 insertions(+), 2 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/net/igb/igb_main.c
>>>>>>>> b/drivers/net/igb/igb_main.c
>>>>>>>> index 58c665b..30c9cc6 100644
>>>>>>>> --- a/drivers/net/igb/igb_main.c
>>>>>>>> +++ b/drivers/net/igb/igb_main.c
>>>>>>>> @@ -2281,7 +2281,8 @@ static int __devinit igb_sw_init(struct
>>>>>>>> igb_adapter
>>>>>>>> *adapter)
>>>>>>>> adapter->rx_itr_setting = IGB_DEFAULT_ITR;
>>>>>>>> adapter->tx_itr_setting = IGB_DEFAULT_ITR;
>>>>>>>>
>>>>>>>> - adapter->max_frame_size = netdev->mtu + ETH_HLEN +
>>>>>>>> ETH_FCS_LEN;
>>>>>>>> + adapter->max_frame_size = (netdev->mtu + ETH_HLEN +
>>>>>>>> ETH_FCS_LEN
>>>>>>>> + + VLAN_HLEN);
>>>>>>>> adapter->min_frame_size = ETH_ZLEN + ETH_FCS_LEN;
>>>>>>>>
>>>>>>>> spin_lock_init(&adapter->stats64_lock);
>>>>>>>> @@ -4303,7 +4304,7 @@ static int igb_change_mtu(struct net_device
>>>>>>>> *netdev, int new_mtu)
>>>>>>>> {
>>>>>>>> struct igb_adapter *adapter = netdev_priv(netdev);
>>>>>>>> struct pci_dev *pdev = adapter->pdev;
>>>>>>>> - int max_frame = new_mtu + ETH_HLEN + ETH_FCS_LEN;
>>>>>>>> + int max_frame = new_mtu + ETH_HLEN + ETH_FCS_LEN +
>>>>>>>> VLAN_HLEN;
>>>>>>>> u32 rx_buffer_len, i;
>>>>>>>>
>>>>>>>> if ((new_mtu< 68) || (max_frame>
>>>>>>>> MAX_JUMBO_FRAME_SIZE)) {
>>>>>>>
>>>>>>> While testing this patch, validation found that the patch
>>>>>>> reduces the
>>>>>>> maximum mtu size
>>>>>>> by 4 bytes (reduces it from 9216 to 9212). This is not a
>>>>>>> desired side
>>>>>>> effect of this patch.
>>>>>>
>>>>>> You could add handling for that case and have it act as it used
>>>>>> to when
>>>>>> new_mtu is greater than 9212?
>>>>>>
>>>>>> I tested e1000e and it worked w/out hacking at 1500 MTU, so maybe
>>>>>> check how it does it?
>>>>>
>>>>> I just wanted to bring this up again to see if any progress had been
>>>>> made. We were looking at this driver and trying to figure out the
>>>>> best way to convert it to use the new vlan model but I'm not familiar
>>>>
>>>> I've been watching :)
>>>>
>>>>> enough with the hardware to know. It seems that all of the other
>>>>> Intel drivers unconditionally add space for the vlan tag to the
>>>>> receive buffer (and would therefore have similar effects as this
>>>>> patch), is there something different about this card?
>>>>>
>>>>> I believe that Alex was working on something in this area (in the
>>>>> context of one of my patches from a long time ago) but I'm not sure
>>>>> what came of that.
>>>>
>>>> Truth is, I don't really see why it's a problem to decrease the
>>>> maximum MTU slightly in order to make it work with VLANs.
>>>>
>>>> I'm not sure if there is some way to make it work with VLANs
>>>> and not decrease the maximum MTU.
>>>
>>> This was the reason this did not get accepted. I was looking into what
>>> could be done so that we did not decease the maximum MTU, but I got
>>> side-tracked and have not done anything on it in several months.
>>>
>>
>> I can take a look at fixing this most likely tomorrow. I have some
>> work planned for igb anyway over the next few days.
>>
>> Odds are it is just a matter of where the VLAN_HLEN is added. As I
>> recall for our drivers the correct spot is in the setting of
>> rx_buffer_len since that is the area more concerned with maximum
>> receive frame size versus the mtu section which is more concerned with
>> the transmit side of things.
>
> Did a patch for this ever get posted? I'll be happy to test it
> if so...
>
> Thanks,
> Ben
>
We haven't posted one yet. I have one written up but it is currently
mixed in with a set of 30 patches that I am testing/cleaning
up/formatting before submitting to our formal validation team. I will
likely be submitting it to Jeff Kirsher sometime next week and the
patches will probably be available a few weeks after that.
Thanks,
Alex
^ permalink raw reply
* [PATCH net-next-2.6] e1000: save skb counts in TX to avoid cache misses
From: Dean Nelson @ 2011-08-26 0:39 UTC (permalink / raw)
To: netdev, Jeff Kirshier; +Cc: Andy Gospodarek
Virtual Machines with emulated e1000 network adapter running on Parallels'
server were seeing kernel panics due to the e1000 driver dereferencing an
unexpected NULL pointer retrieved from buffer_info->skb.
The problem has been addressed for the e1000e driver, but not for the e1000.
Since the two drivers share similar code in the affected area, a port of the
following e1000e driver commit solves the issue for the e1000 driver:
commit 9ed318d546a29d7a591dbe648fd1a2efe3be1180
Author: Tom Herbert <therbert@google.com>
Date: Wed May 5 14:02:27 2010 +0000
e1000e: save skb counts in TX to avoid cache misses
In e1000_tx_map, precompute number of segements and bytecounts which
are derived from fields in skb; these are stored in buffer_info. When
cleaning tx in e1000_clean_tx_irq use the values in the associated
buffer_info for statistics counting, this eliminates cache misses
on skb fields.
Signed-off-by: Dean Nelson <dnelson@redhat.com>
---
This patch (backported to RHEL6.2) was verified by Dmitry Skorodumov to solve
Parallels' reported problem.
drivers/net/ethernet/intel/e1000/e1000.h | 2 ++
drivers/net/ethernet/intel/e1000/e1000_main.c | 18 +++++++++---------
2 files changed, 11 insertions(+), 9 deletions(-)
diff --git a/drivers/net/ethernet/intel/e1000/e1000.h b/drivers/net/ethernet/intel/e1000/e1000.h
index 24f41da..4ea87b1 100644
--- a/drivers/net/ethernet/intel/e1000/e1000.h
+++ b/drivers/net/ethernet/intel/e1000/e1000.h
@@ -150,6 +150,8 @@ struct e1000_buffer {
unsigned long time_stamp;
u16 length;
u16 next_to_watch;
+ unsigned int segs;
+ unsigned int bytecount;
u16 mapped_as_page;
};
diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
index 7c280e5..4a32c15 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -2848,7 +2848,7 @@ static int e1000_tx_map(struct e1000_adapter *adapter,
struct e1000_buffer *buffer_info;
unsigned int len = skb_headlen(skb);
unsigned int offset = 0, size, count = 0, i;
- unsigned int f;
+ unsigned int f, bytecount, segs;
i = tx_ring->next_to_use;
@@ -2949,7 +2949,13 @@ static int e1000_tx_map(struct e1000_adapter *adapter,
}
}
+ segs = skb_shinfo(skb)->gso_segs ?: 1;
+ /* multiply data chunks by size of headers */
+ bytecount = ((segs - 1) * skb_headlen(skb)) + skb->len;
+
tx_ring->buffer_info[i].skb = skb;
+ tx_ring->buffer_info[i].segs = segs;
+ tx_ring->buffer_info[i].bytecount = bytecount;
tx_ring->buffer_info[first].next_to_watch = i;
return count;
@@ -3623,14 +3629,8 @@ static bool e1000_clean_tx_irq(struct e1000_adapter *adapter,
cleaned = (i == eop);
if (cleaned) {
- struct sk_buff *skb = buffer_info->skb;
- unsigned int segs, bytecount;
- segs = skb_shinfo(skb)->gso_segs ?: 1;
- /* multiply data chunks by size of headers */
- bytecount = ((segs - 1) * skb_headlen(skb)) +
- skb->len;
- total_tx_packets += segs;
- total_tx_bytes += bytecount;
+ total_tx_packets += buffer_info->segs;
+ total_tx_bytes += buffer_info->bytecount;
}
e1000_unmap_and_free_tx_resource(adapter, buffer_info);
tx_desc->upper.data = 0;
^ permalink raw reply related
* Re: slow performance on disk/network i/o full speed after drop_caches
From: Wu Fengguang @ 2011-08-26 2:16 UTC (permalink / raw)
To: Stefan Priebe - Profihost AG
Cc: Pekka Enberg, LKML, linux-mm@kvack.org, Andrew Morton, Mel Gorman,
Jens Axboe, Linux Netdev List
In-Reply-To: <4E560F2A.1030801@profihost.ag>
Hi Stefan,
> Here is the data you requested:
>
> root@server1015-han:~# grep . /sys/devices/system/node/node*/vmstat
> /sys/devices/system/node/node0/vmstat:nr_written 5546561
> /sys/devices/system/node/node0/vmstat:nr_dirtied 5572497
> /sys/devices/system/node/node1/vmstat:nr_written 3936
> /sys/devices/system/node/node1/vmstat:nr_dirtied 4190
Ah you are running an older kernel that didn't show all the vmstat
numbers. But still it's revealing that node 0 is used heavily and node
1 is almost idle. So I won't be surprised to see most free pages lie
in node 1.
> modified it a little bit:
> ~# while [ true ]; do ps -eo
> user,pid,tid,class,rtprio,ni,pri,psr,pcpu,vsz,rss,pmem,stat,wchan:28,cmd
> | grep scp | grep -v grep; sleep 1; done
>
> root 12409 12409 TS - 0 19 0 59.8 42136 1724 0.0 Ss
> poll_schedule_timeout scp -t /tmp/
It's mostly doing poll() waits. There must be some dependency on
something other to make progress. Would you post the full ps output
for all tasks, and even better, run
echo t > /proc/sysrq-trigger
To dump the kernel stacks?
Thanks,
Fengguang
> root 12409 12409 TS - 0 19 0 64.0 42136 1724 0.0 Rs
> - scp -t /tmp/
> root 12409 12409 TS - 0 19 0 67.7 42136 1724 0.0 Rs
> - scp -t /tmp/
> root 12409 12409 TS - 0 19 8 70.6 42136 1724 0.0 Ss
> poll_schedule_timeout scp -t /tmp/
> root 12409 12409 TS - 0 19 8 73.5 42136 1724 0.0 Rs
> - scp -t /tmp/
> root 12409 12409 TS - 0 19 8 76.0 42136 1724 0.0 Rs
> - scp -t /tmp/
> root 12409 12409 TS - 0 19 8 78.2 42136 1724 0.0 Rs
> - scp -t /tmp/
> root 12409 12409 TS - 0 19 8 80.0 42136 1724 0.0 Rs
> - scp -t /tmp/
> root 12409 12409 TS - 0 19 8 80.9 42136 1724 0.0 Ss
> poll_schedule_timeout scp -t /tmp/
> root 12409 12409 TS - 0 19 2 76.7 42136 1724 0.0 Ss
> poll_schedule_timeout scp -t /tmp/
> root 12409 12409 TS - 0 19 1 75.6 42136 1724 0.0 Ds
> pipe_read scp -t /tmp/
> root 12409 12409 TS - 0 19 0 76.0 42136 1724 0.0 Rs
> - scp -t /tmp/
> root 12409 12409 TS - 0 19 1 75.2 42136 1724 0.0 Rs
> - scp -t /tmp/
> root 12409 12409 TS - 0 19 1 76.6 42136 1724 0.0 Rs
> - scp -t /tmp/
> root 12409 12409 TS - 0 19 1 77.9 42136 1724 0.0 Rs
> - scp -t /tmp/
> root 12409 12409 TS - 0 19 1 79.0 42136 1724 0.0 Ss
> poll_schedule_timeout scp -t /tmp/
> root 12409 12409 TS - 0 19 1 72.8 42136 1724 0.0 Ss
> poll_schedule_timeout scp -t /tmp/
> root 12409 12409 TS - 0 19 0 73.0 42136 1724 0.0 Ss
> poll_schedule_timeout scp -t /tmp/
> root 12409 12409 TS - 0 19 0 73.8 42136 1724 0.0 Ss
> poll_schedule_timeout scp -t /tmp/
> root 12409 12409 TS - 0 19 1 74.3 42136 1724 0.0 Ss
> poll_schedule_timeout scp -t /tmp/
> root 12409 12409 TS - 0 19 1 73.4 42136 1724 0.0 Ss
> - scp -t /tmp/
> root 12409 12409 TS - 0 19 1 71.3 42136 1724 0.0 Ss
> poll_schedule_timeout scp -t /tmp/
> root 12409 12409 TS - 0 19 1 71.9 42136 1724 0.0 Rs
> - scp -t /tmp/
> root 12409 12409 TS - 0 19 0 72.7 42136 1724 0.0 Ss
> poll_schedule_timeout scp -t /tmp/
> root 12409 12409 TS - 0 19 3 73.5 42136 1724 0.0 Rs
> - scp -t /tmp/
> root 12409 12409 TS - 0 19 3 74.4 42136 1724 0.0 Rs
> - scp -t /tmp/
> root 12409 12409 TS - 0 19 3 75.2 42136 1724 0.0 Rs
> - scp -t /tmp/
> root 12409 12409 TS - 0 19 0 76.0 42136 1724 0.0 Ss
> poll_schedule_timeout scp -t /tmp/
> root 12409 12409 TS - 0 19 8 76.6 42136 1724 0.0 Ss
> poll_schedule_timeout scp -t /tmp/
> root 12409 12409 TS - 0 19 1 74.8 42136 1724 0.0 Ss
> poll_schedule_timeout scp -t /tmp/
> root 12409 12409 TS - 0 19 1 73.2 42136 1724 0.0 Ss
> poll_schedule_timeout scp -t /tmp/
> root 12409 12409 TS - 0 19 1 73.9 42136 1724 0.0 Rs
> poll_schedule_timeout scp -t /tmp/
> root 12409 12409 TS - 0 19 0 72.4 42136 1724 0.0 Ss
> poll_schedule_timeout scp -t /tmp/
> root 12409 12409 TS - 0 19 8 72.0 42136 1724 0.0 Ss
> poll_schedule_timeout scp -t /tmp/
> root 12409 12409 TS - 0 19 8 72.5 42136 1724 0.0 Ss
> poll_schedule_timeout scp -t /tmp/
> root 12409 12409 TS - 0 19 8 72.9 42136 1724 0.0 Rs
> - scp -t /tmp/
> root 12409 12409 TS - 0 19 8 73.5 42136 1724 0.0 Rs
> - scp -t /tmp/
> root 12566 12566 TS - 0 19 1 0.0 42136 1728 0.0 Rs
> - scp -t /tmp/
> root 12566 12566 TS - 0 19 1 23.0 42136 1728 0.0 Rs
> - scp -t /tmp/
> root 12566 12566 TS - 0 19 1 49.5 42136 1728 0.0 Rs
> - scp -t /tmp/
> root 12566 12566 TS - 0 19 2 63.3 42136 1728 0.0 Rs
> - scp -t /tmp/
> root 12566 12566 TS - 0 19 1 71.5 42136 1728 0.0 Rs
> - scp -t /tmp/
> root 12566 12566 TS - 0 19 1 77.4 42136 1728 0.0 Rs
> - scp -t /tmp/
> root 12566 12566 TS - 0 19 1 70.3 42136 1728 0.0 Rs
> - scp -t /tmp/
> root 12566 12566 TS - 0 19 1 73.1 42136 1728 0.0 Ss
> poll_schedule_timeout scp -t /tmp/
> root 12566 12566 TS - 0 19 0 65.7 42136 1728 0.0 Ss
> poll_schedule_timeout scp -t /tmp/
> root 12566 12566 TS - 0 19 1 61.2 42136 1728 0.0 Ss
> - scp -t /tmp/
> root 12566 12566 TS - 0 19 1 63.7 42136 1728 0.0 Rs
> - scp -t /tmp/
> root 12636 12636 TS - 0 19 8 0.0 42136 1728 0.0 Ss
> poll_schedule_timeout scp -t /tmp/
>
>
> Stefan
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* Re: slow performance on disk/network i/o full speed after drop_caches
From: Stefan Priebe - Profihost AG @ 2011-08-26 2:54 UTC (permalink / raw)
To: Wu Fengguang
Cc: Pekka Enberg, LKML, linux-mm@kvack.org, Andrew Morton, Mel Gorman,
Jens Axboe, Linux Netdev List
In-Reply-To: <20110826021648.GA19529@localhost>
Hi Wu,
> Ah you are running an older kernel that didn't show all the vmstat
> numbers. But still it's revealing that node 0 is used heavily and node
> 1 is almost idle. So I won't be surprised to see most free pages lie
> in node 1.
I'm running a 2.6.38 kernel.
There is at least a numastat proc file.
grep . /sys/devices/system/node/node*/numastat
/sys/devices/system/node/node0/numastat:numa_hit 5958586
/sys/devices/system/node/node0/numastat:numa_miss 0
/sys/devices/system/node/node0/numastat:numa_foreign 0
/sys/devices/system/node/node0/numastat:interleave_hit 4191
/sys/devices/system/node/node0/numastat:local_node 5885189
/sys/devices/system/node/node0/numastat:other_node 73397
/sys/devices/system/node/node1/numastat:numa_hit 488922
/sys/devices/system/node/node1/numastat:numa_miss 0
/sys/devices/system/node/node1/numastat:numa_foreign 0
/sys/devices/system/node/node1/numastat:interleave_hit 4187
/sys/devices/system/node/node1/numastat:local_node 386741
/sys/devices/system/node/node1/numastat:other_node 102181
>> modified it a little bit:
>> ~# while [ true ]; do ps -eo
>> user,pid,tid,class,rtprio,ni,pri,psr,pcpu,vsz,rss,pmem,stat,wchan:28,cmd
>> | grep scp | grep -v grep; sleep 1; done
>>
>> root 12409 12409 TS - 0 19 0 59.8 42136 1724 0.0 Ss
>> poll_schedule_timeout scp -t /tmp/
>
> It's mostly doing poll() waits. There must be some dependency on
> something other to make progress. Would you post the full ps output
> for all tasks, and even better, run
complete ps output:
http://pastebin.com/raw.php?i=b948svzN
> echo t> /proc/sysrq-trigger
sadly i wa sonly able to grab the output in this crazy format:
http://pastebin.com/raw.php?i=MBXvvyH1
Hope that still helps.
Thanks Stefan
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* Re: slow performance on disk/network i/o full speed after drop_caches
From: Wu Fengguang @ 2011-08-26 3:03 UTC (permalink / raw)
To: Stefan Priebe - Profihost AG
Cc: Pekka Enberg, LKML, linux-mm@kvack.org, Andrew Morton, Mel Gorman,
Jens Axboe, Linux Netdev List
In-Reply-To: <4E570AEB.1040703@profihost.ag>
On Fri, Aug 26, 2011 at 10:54:35AM +0800, Stefan Priebe - Profihost AG wrote:
> Hi Wu,
>
> > Ah you are running an older kernel that didn't show all the vmstat
> > numbers. But still it's revealing that node 0 is used heavily and node
> > 1 is almost idle. So I won't be surprised to see most free pages lie
> > in node 1.
> I'm running a 2.6.38 kernel.
>
> There is at least a numastat proc file.
Thanks. This shows that node0 is accessed 10x more than node1.
> grep . /sys/devices/system/node/node*/numastat
> /sys/devices/system/node/node0/numastat:numa_hit 5958586
> /sys/devices/system/node/node0/numastat:numa_miss 0
> /sys/devices/system/node/node0/numastat:numa_foreign 0
> /sys/devices/system/node/node0/numastat:interleave_hit 4191
> /sys/devices/system/node/node0/numastat:local_node 5885189
> /sys/devices/system/node/node0/numastat:other_node 73397
> /sys/devices/system/node/node1/numastat:numa_hit 488922
> /sys/devices/system/node/node1/numastat:numa_miss 0
> /sys/devices/system/node/node1/numastat:numa_foreign 0
> /sys/devices/system/node/node1/numastat:interleave_hit 4187
> /sys/devices/system/node/node1/numastat:local_node 386741
> /sys/devices/system/node/node1/numastat:other_node 102181
>
> >> modified it a little bit:
> >> ~# while [ true ]; do ps -eo
> >> user,pid,tid,class,rtprio,ni,pri,psr,pcpu,vsz,rss,pmem,stat,wchan:28,cmd
> >> | grep scp | grep -v grep; sleep 1; done
> >>
> >> root 12409 12409 TS - 0 19 0 59.8 42136 1724 0.0 Ss
> >> poll_schedule_timeout scp -t /tmp/
> >
> > It's mostly doing poll() waits. There must be some dependency on
> > something other to make progress. Would you post the full ps output
> > for all tasks, and even better, run
> complete ps output:
> http://pastebin.com/raw.php?i=b948svzN
In that log, scp happens to be in R state and also no other tasks in D
state. Would you retry in the hope of catching some stucked state?
> > echo t> /proc/sysrq-trigger
> sadly i wa sonly able to grab the output in this crazy format:
> http://pastebin.com/raw.php?i=MBXvvyH1
It's pretty readable dmesg, except that the data is incomplete and
there are nothing valuable in the uploaded portion..
Thanks,
Fengguang
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* Re: slow performance on disk/network i/o full speed after drop_caches
From: Stefan Priebe @ 2011-08-26 3:13 UTC (permalink / raw)
To: Wu Fengguang
Cc: Pekka Enberg, LKML, linux-mm@kvack.org, Andrew Morton, Mel Gorman,
Jens Axboe, Linux Netdev List
In-Reply-To: <20110826030313.GA24058@localhost>
>> There is at least a numastat proc file.
>
> Thanks. This shows that node0 is accessed 10x more than node1.
What can i do to prevent this or isn't this normal when a machine mostly idles so processes are mostly processed by cpu0.
>
>> complete ps output:
>> http://pastebin.com/raw.php?i=b948svzN
>
> In that log, scp happens to be in R state and also no other tasks in D
> state. Would you retry in the hope of catching some stucked state?
Sadly not as the sysrq trigger has rebootet the machine and it will now run fine for 1 or 2 days.
>
>>> echo t> /proc/sysrq-trigger
>> sadly i wa sonly able to grab the output in this crazy format:
>> http://pastebin.com/raw.php?i=MBXvvyH1
>
> It's pretty readable dmesg, except that the data is incomplete and
> there are nothing valuable in the uploaded portion..
That was everything i could grab through netconsole. Is there a better way?
Stefan
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* Re: slow performance on disk/network i/o full speed after drop_caches
From: Wu Fengguang @ 2011-08-26 3:26 UTC (permalink / raw)
To: Stefan Priebe
Cc: Pekka Enberg, LKML, linux-mm@kvack.org, Andrew Morton, Mel Gorman,
Jens Axboe, Linux Netdev List
In-Reply-To: <D299D0AE-2F3C-42E2-9723-A3D7C0108C40@profihost.ag>
On Fri, Aug 26, 2011 at 11:13:07AM +0800, Stefan Priebe wrote:
>
> >> There is at least a numastat proc file.
> >
> > Thanks. This shows that node0 is accessed 10x more than node1.
>
> What can i do to prevent this or isn't this normal when a machine mostly idles so processes are mostly processed by cpu0.
Yes, that's normal. However it should explain why it's slow even when
there are lots of free pages _globally_.
> >
> >> complete ps output:
> >> http://pastebin.com/raw.php?i=b948svzN
> >
> > In that log, scp happens to be in R state and also no other tasks in D
> > state. Would you retry in the hope of catching some stucked state?
> Sadly not as the sysrq trigger has rebootet the machine and it will now run fine for 1 or 2 days.
Oops, sorry! It might be possible to reproduce the issue by manually
eating all of the memory with sparse file data:
truncate -s 1T 1T
cp 1T /dev/null
> >
> >>> echo t> /proc/sysrq-trigger
> >> sadly i wa sonly able to grab the output in this crazy format:
> >> http://pastebin.com/raw.php?i=MBXvvyH1
> >
> > It's pretty readable dmesg, except that the data is incomplete and
> > there are nothing valuable in the uploaded portion..
> That was everything i could grab through netconsole. Is there a better way?
netconsole is enough. The partial output should be due to the reboot...
Thanks,
Fengguang
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* Re: slow performance on disk/network i/o full speed after drop_caches
From: Zhu Yanhai @ 2011-08-26 3:30 UTC (permalink / raw)
To: Wu Fengguang
Cc: Stefan Priebe, Pekka Enberg, LKML, linux-mm@kvack.org,
Andrew Morton, Mel Gorman, Jens Axboe, Linux Netdev List
In-Reply-To: <20110826032601.GA26282@localhost>
Fengguang,
Maybe it's because zone_reclaim_mode? We often have received some reports that
scp or something like that is slow with no reason, and mostly it's due
to someone
enabled zone_reclaim_mode by mistake.
Stefan, is your zone_reclaim_mode enabled? try 'cat
/proc/sys/vm/zone_reclaim_mode',
and echo 0 to it to disable.
Thanks,
Zhu Yanhai
2011/8/26 Wu Fengguang <fengguang.wu@intel.com>:
> On Fri, Aug 26, 2011 at 11:13:07AM +0800, Stefan Priebe wrote:
>>
>> >> There is at least a numastat proc file.
>> >
>> > Thanks. This shows that node0 is accessed 10x more than node1.
>>
>> What can i do to prevent this or isn't this normal when a machine mostly idles so processes are mostly processed by cpu0.
>
> Yes, that's normal. However it should explain why it's slow even when
> there are lots of free pages _globally_.
>
>> >
>> >> complete ps output:
>> >> http://pastebin.com/raw.php?i=b948svzN
>> >
>> > In that log, scp happens to be in R state and also no other tasks in D
>> > state. Would you retry in the hope of catching some stucked state?
>> Sadly not as the sysrq trigger has rebootet the machine and it will now run fine for 1 or 2 days.
>
> Oops, sorry! It might be possible to reproduce the issue by manually
> eating all of the memory with sparse file data:
>
> truncate -s 1T 1T
> cp 1T /dev/null
>
>> >
>> >>> echo t> /proc/sysrq-trigger
>> >> sadly i wa sonly able to grab the output in this crazy format:
>> >> http://pastebin.com/raw.php?i=MBXvvyH1
>> >
>> > It's pretty readable dmesg, except that the data is incomplete and
>> > there are nothing valuable in the uploaded portion..
>> That was everything i could grab through netconsole. Is there a better way?
>
> netconsole is enough. The partial output should be due to the reboot...
>
> Thanks,
> Fengguang
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox