* Re: [Q/RFC] BPF use in broader scope
From: Li Yu @ 2012-05-11 7:06 UTC (permalink / raw)
To: Jiri Pirko
Cc: Eric Dumazet, David Miller, netdev, bhutchings, shemminger, matt
In-Reply-To: <20120511062257.GA1561@minipsycho>
于 2012年05月11日 14:22, Jiri Pirko 写道:
> Fri, May 11, 2012 at 04:41:31AM CEST, raise.sail@gmail.com wrote:
>> 于 2012年03月29日 17:31, Jiri Pirko 写道:
>>> Thu, Mar 29, 2012 at 10:45:32AM CEST, eric.dumazet@gmail.com wrote:
>>>> On Thu, 2012-03-29 at 10:31 +0200, Jiri Pirko wrote:
>>>>> Thu, Mar 29, 2012 at 10:02:25AM CEST, eric.dumazet@gmail.com wrote:
>>>>>> On Thu, 2012-03-29 at 09:54 +0200, Jiri Pirko wrote:
>>>>>>
>>>>>>> Yep, I'm aware. I must admit that the JIT code scares me a litte :(
>>>>>>>
>>>>>>
>>>>>> If you add a new XOR instruction in interpreter only, JIT compiler will
>>>>>> automatically aborts, so no risk.
>>>>>>
>>>>>> Each arch maintainer will add the support for the new instructions as
>>>>>> separate patches.
>>>>>>
>>>>>> So you can focus on net/core/filter.c file only.
>>>>>>
>>>>>
>>>>> Ok - I can do this for 2). But for 3) JITs need to be modified. So I
>>>>> would like to kindly ask you and Matt if you can do this modification so
>>>>> bpf_func takes pointer to mem (scratch store) as second parameter. I'm
>>>>> sure it's very easy for you to do.
>>>>
>>>> I am not sure why you want this.
>>>>
>>>> This adds register pressure (at least for x86) ...
>>>
>>> Well I think that there would become handy to be able to pass some data
>>> to bpf_func (other than skb). But it's just an idea.
>>>
>>
>> Hi, Jiri Pirko, any progress of extended BPF? :)
>>
>> I am interesting in 3) much. For my requirements,
>> it just only need BPF has ability to handle arbitrary
>> "pre-filled memory area", but not handle both a skb and
>> such a memory area at same time, so I think that register
>> pressure should not be become the performance bottleneck
>> here.
>>
>> Otherwise, I must construct a fake sk_buff to execute filter
>> feature, it is ugly, isn't it?
>>
>> I guess that Nuno Martins's requirements also are similar.
>>
>> And, I also would like join this project, if you need.
>
> For my needs it turned out I do not need pre-filled memory. So I dropped
> that point.
>
Oops, I may try to work on this, would you like send
a copy of your sk-unattached filters patch to me ?
I think that it is a good start.
Thanks for your time.
Yu
>>
>> Thanks
>>
>> Yu
>>
>>>>
>>>>
>>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>
>
^ permalink raw reply
* Re: [Q/RFC] BPF use in broader scope
From: Jiri Pirko @ 2012-05-11 6:22 UTC (permalink / raw)
To: Li Yu; +Cc: Eric Dumazet, David Miller, netdev, bhutchings, shemminger, matt
In-Reply-To: <4FAC7C5B.40109@gmail.com>
Fri, May 11, 2012 at 04:41:31AM CEST, raise.sail@gmail.com wrote:
>于 2012年03月29日 17:31, Jiri Pirko 写道:
>>Thu, Mar 29, 2012 at 10:45:32AM CEST, eric.dumazet@gmail.com wrote:
>>>On Thu, 2012-03-29 at 10:31 +0200, Jiri Pirko wrote:
>>>>Thu, Mar 29, 2012 at 10:02:25AM CEST, eric.dumazet@gmail.com wrote:
>>>>>On Thu, 2012-03-29 at 09:54 +0200, Jiri Pirko wrote:
>>>>>
>>>>>>Yep, I'm aware. I must admit that the JIT code scares me a litte :(
>>>>>>
>>>>>
>>>>>If you add a new XOR instruction in interpreter only, JIT compiler will
>>>>>automatically aborts, so no risk.
>>>>>
>>>>>Each arch maintainer will add the support for the new instructions as
>>>>>separate patches.
>>>>>
>>>>>So you can focus on net/core/filter.c file only.
>>>>>
>>>>
>>>>Ok - I can do this for 2). But for 3) JITs need to be modified. So I
>>>>would like to kindly ask you and Matt if you can do this modification so
>>>>bpf_func takes pointer to mem (scratch store) as second parameter. I'm
>>>>sure it's very easy for you to do.
>>>
>>>I am not sure why you want this.
>>>
>>>This adds register pressure (at least for x86) ...
>>
>>Well I think that there would become handy to be able to pass some data
>>to bpf_func (other than skb). But it's just an idea.
>>
>
>Hi, Jiri Pirko, any progress of extended BPF? :)
>
>I am interesting in 3) much. For my requirements,
>it just only need BPF has ability to handle arbitrary
>"pre-filled memory area", but not handle both a skb and
>such a memory area at same time, so I think that register
>pressure should not be become the performance bottleneck
>here.
>
>Otherwise, I must construct a fake sk_buff to execute filter
>feature, it is ugly, isn't it?
>
>I guess that Nuno Martins's requirements also are similar.
>
>And, I also would like join this project, if you need.
For my needs it turned out I do not need pre-filled memory. So I dropped
that point.
>
>Thanks
>
>Yu
>
>>>
>>>
>>>
>>--
>>To unsubscribe from this list: send the line "unsubscribe netdev" in
>>the body of a message to majordomo@vger.kernel.org
>>More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
^ permalink raw reply
* [PATCH iproute2] tc_codel: Controlled Delay AQM
From: Eric Dumazet @ 2012-05-11 6:22 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: netdev, Dave Taht
From: Eric Dumazet <edumazet@google.com>
An implementation of CoDel AQM, from Kathleen Nichols and Van Jacobson.
http://queue.acm.org/detail.cfm?id=2209336
This AQM main input is no longer queue size in bytes or packets, but the
delay packets stay in (FIFO) queue.
As we don't have infinite memory, we still can drop packets in enqueue()
in case of massive load, but mean of CoDel is to drop packets in
dequeue(), using a control law based on two simple parameters :
target : target sojourn time (default 5ms)
interval : width of moving time window (default 100ms)
Selected packets are dropped, unless ECN is enabled and packets can get
ECN mark instead.
Usage: tc qdisc ... codel [ limit PACKETS ] [ target TIME ]
[ interval TIME ] [ ecn ]
qdisc codel 10: parent 1:1 limit 2000p target 3.0ms interval 60.0ms ecn
Sent 13347099587 bytes 8815805 pkt (dropped 0, overlimits 0 requeues 0)
rate 202365Kbit 16708pps backlog 113550b 75p requeues 0
count 116 lastcount 98 ldelay 4.3ms dropping drop_next 816us
maxpacket 1514 ecn_mark 84399 drop_overlimit 0
CoDel must be seen as a base module, and should be used keeping in mind
there is still a FIFO queue. So a typical setup will probably need a
hierarchy of several qdiscs and packet classifiers to be able to meet
whatever constraints a user might have.
One possible example would be to use fq_codel, which combines Fair
Queueing and CoDel, in replacement of sfq / sfq_red.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Dave Taht <dave.taht@bufferbloat.net>
---
Notes :
1) : Dave Taht will send a nice man-page for this stuff.
2) : the TCA_NETEM_ECN bit is because of include/linux/pkt_sched.h sync
with net-next
(I'll send a separate patch for netem)
*
include/linux/pkt_sched.h | 27 +++++
tc/Makefile | 1
tc/q_codel.c | 188 ++++++++++++++++++++++++++++++++++++
3 files changed, 216 insertions(+)
diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h
index 410b33d..cde56c2 100644
--- a/include/linux/pkt_sched.h
+++ b/include/linux/pkt_sched.h
@@ -509,6 +509,7 @@ enum {
TCA_NETEM_CORRUPT,
TCA_NETEM_LOSS,
TCA_NETEM_RATE,
+ TCA_NETEM_ECN,
__TCA_NETEM_MAX,
};
@@ -654,4 +655,30 @@ struct tc_qfq_stats {
__u32 lmax;
};
+/* CODEL */
+
+enum {
+ TCA_CODEL_UNSPEC,
+ TCA_CODEL_TARGET,
+ TCA_CODEL_LIMIT,
+ TCA_CODEL_INTERVAL,
+ TCA_CODEL_ECN,
+ __TCA_CODEL_MAX
+};
+
+#define TCA_CODEL_MAX (__TCA_CODEL_MAX - 1)
+
+struct tc_codel_xstats {
+ __u32 maxpacket; /* largest packet we've seen so far */
+ __u32 count; /* how many drops we've done since the last time we
+ * entered dropping state
+ */
+ __u32 lastcount; /* count at entry to dropping state */
+ __u32 ldelay; /* in-queue delay seen by most recently dequeued packet */
+ __s32 drop_next; /* time to drop next packet */
+ __u32 drop_overlimit; /* number of time max qdisc packet limit was hit */
+ __u32 ecn_mark; /* number of packets we ECN marked instead of dropped */
+ __u32 dropping; /* are we in dropping state ? */
+};
+
#endif
diff --git a/tc/Makefile b/tc/Makefile
index be8cd5a..8a7cc8d 100644
--- a/tc/Makefile
+++ b/tc/Makefile
@@ -47,6 +47,7 @@ TCMODULES += em_cmp.o
TCMODULES += em_u32.o
TCMODULES += em_meta.o
TCMODULES += q_mqprio.o
+TCMODULES += q_codel.o
TCSO :=
ifeq ($(TC_CONFIG_ATM),y)
diff --git a/tc/q_codel.c b/tc/q_codel.c
new file mode 100644
index 0000000..9f40046
--- /dev/null
+++ b/tc/q_codel.c
@@ -0,0 +1,188 @@
+/*
+ * Codel - The Controlled-Delay Active Queue Management algorithm
+ *
+ * Copyright (C) 2011-2012 Kathleen Nichols <nichols@pollere.com>
+ * Copyright (C) 2011-2012 Van Jacobson <van@pollere.com>
+ * Copyright (C) 2012 Michael D. Taht <dave.taht@bufferbloat.net>
+ * Copyright (C) 2012 Eric Dumazet <edumazet@google.com>
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ * notice, this list of conditions, and the following disclaimer,
+ * without modification.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ * 3. The names of the authors may not be used to endorse or promote products
+ * derived from this software without specific prior written permission.
+ *
+ * Alternatively, provided that this notice is retained in full, this
+ * software may be distributed under the terms of the GNU General
+ * Public License ("GPL") version 2, in which case the provisions of the
+ * GPL apply INSTEAD OF those given above.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
+ * DAMAGE.
+ *
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <syslog.h>
+#include <fcntl.h>
+#include <sys/socket.h>
+#include <netinet/in.h>
+#include <arpa/inet.h>
+#include <string.h>
+
+#include "utils.h"
+#include "tc_util.h"
+
+static void explain(void)
+{
+ fprintf(stderr, "Usage: ... codel [ limit PACKETS ] [ target TIME]\n");
+ fprintf(stderr, " [ interval TIME ] [ ecn ]\n");
+}
+
+static int codel_parse_opt(struct qdisc_util *qu, int argc, char **argv,
+ struct nlmsghdr *n)
+{
+ unsigned limit = 0;
+ unsigned target = 0;
+ unsigned interval = 0;
+ int ecn = -1;
+ struct rtattr *tail;
+
+ while (argc > 0) {
+ if (strcmp(*argv, "limit") == 0) {
+ NEXT_ARG();
+ if (get_unsigned(&limit, *argv, 0)) {
+ fprintf(stderr, "Illegal \"limit\"\n");
+ return -1;
+ }
+ } else if (strcmp(*argv, "target") == 0) {
+ NEXT_ARG();
+ if (get_time(&target, *argv)) {
+ fprintf(stderr, "Illegal \"target\"\n");
+ return -1;
+ }
+ } else if (strcmp(*argv, "interval") == 0) {
+ NEXT_ARG();
+ if (get_time(&interval, *argv)) {
+ fprintf(stderr, "Illegal \"interval\"\n");
+ return -1;
+ }
+ } else if (strcmp(*argv, "ecn") == 0) {
+ ecn = 1;
+ } else if (strcmp(*argv, "noecn") == 0) {
+ ecn = 0;
+ } else if (strcmp(*argv, "help") == 0) {
+ explain();
+ return -1;
+ } else {
+ fprintf(stderr, "What is \"%s\"?\n", *argv);
+ explain();
+ return -1;
+ }
+ argc--; argv++;
+ }
+
+ tail = NLMSG_TAIL(n);
+ addattr_l(n, 1024, TCA_OPTIONS, NULL, 0);
+ if (limit)
+ addattr_l(n, 1024, TCA_CODEL_LIMIT, &limit, sizeof(limit));
+ if (interval)
+ addattr_l(n, 1024, TCA_CODEL_INTERVAL, &interval, sizeof(interval));
+ if (target)
+ addattr_l(n, 1024, TCA_CODEL_TARGET, &target, sizeof(target));
+ if (ecn != -1)
+ addattr_l(n, 1024, TCA_CODEL_ECN, &ecn, sizeof(ecn));
+ tail->rta_len = (void *) NLMSG_TAIL(n) - (void *) tail;
+ return 0;
+}
+
+static int codel_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt)
+{
+ struct rtattr *tb[TCA_CODEL_MAX + 1];
+ unsigned limit;
+ unsigned interval;
+ unsigned target;
+ unsigned ecn;
+ SPRINT_BUF(b1);
+
+ if (opt == NULL)
+ return 0;
+
+ parse_rtattr_nested(tb, TCA_CODEL_MAX, opt);
+
+ if (tb[TCA_CODEL_LIMIT] &&
+ RTA_PAYLOAD(tb[TCA_CODEL_LIMIT]) >= sizeof(__u32)) {
+ limit = rta_getattr_u32(tb[TCA_CODEL_LIMIT]);
+ fprintf(f, "limit %up ", limit);
+ }
+ if (tb[TCA_CODEL_TARGET] &&
+ RTA_PAYLOAD(tb[TCA_CODEL_TARGET]) >= sizeof(__u32)) {
+ target = rta_getattr_u32(tb[TCA_CODEL_TARGET]);
+ fprintf(f, "target %s ", sprint_time(target, b1));
+ }
+ if (tb[TCA_CODEL_INTERVAL] &&
+ RTA_PAYLOAD(tb[TCA_CODEL_INTERVAL]) >= sizeof(__u32)) {
+ interval = rta_getattr_u32(tb[TCA_CODEL_INTERVAL]);
+ fprintf(f, "interval %s ", sprint_time(interval, b1));
+ }
+ if (tb[TCA_CODEL_ECN] &&
+ RTA_PAYLOAD(tb[TCA_CODEL_ECN]) >= sizeof(__u32)) {
+ ecn = rta_getattr_u32(tb[TCA_CODEL_ECN]);
+ if (ecn)
+ fprintf(f, "ecn ");
+ }
+
+ return 0;
+}
+
+static int codel_print_xstats(struct qdisc_util *qu, FILE *f,
+ struct rtattr *xstats)
+{
+ struct tc_codel_xstats *st;
+ SPRINT_BUF(b1);
+
+ if (xstats == NULL)
+ return 0;
+
+ if (RTA_PAYLOAD(xstats) < sizeof(*st))
+ return -1;
+
+ st = RTA_DATA(xstats);
+ fprintf(f, " count %u lastcount %u ldelay %s",
+ st->count, st->lastcount, sprint_time(st->ldelay, b1));
+ if (st->dropping)
+ fprintf(f, " dropping");
+ if (st->drop_next < 0)
+ fprintf(f, " drop_next -%s", sprint_time(-st->drop_next, b1));
+ else
+ fprintf(f, " drop_next %s", sprint_time(st->drop_next, b1));
+ fprintf(f, "\n maxpacket %u ecn_mark %u drop_overlimit %u",
+ st->maxpacket, st->ecn_mark, st->drop_overlimit);
+ return 0;
+
+}
+
+struct qdisc_util codel_qdisc_util = {
+ .id = "codel",
+ .parse_qopt = codel_parse_opt,
+ .print_qopt = codel_print_opt,
+ .print_xstats = codel_print_xstats,
+};
^ permalink raw reply related
* Re: [PATCH] wlcore: wlcore should depend on MAC80211
From: Sasha Levin @ 2012-05-11 6:22 UTC (permalink / raw)
To: Luciano Coelho; +Cc: linville, linux-wireless, netdev, linux-kernel
In-Reply-To: <1336712810.12189.146.camel@cumari.coelho.fi>
On Fri, May 11, 2012 at 7:06 AM, Luciano Coelho <coelho@ti.com> wrote:
> On Fri, 2012-05-11 at 06:39 +0200, Sasha Levin wrote:
>> wlcore can't be built if MAC80211 isn't set.
>>
>> Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
>> ---
>
> Thanks for your patch, Sasha! But this change was already submitted and
> has been applied in the wl12xx.git tree. I've sent a pull request with
> it already, but it hasn't been pulled into wireless-next and up.
>
> http://www.spinics.net/lists/linux-next/msg19910.html
> http://www.spinics.net/lists/linux-wireless/msg89583.html
Ah, no worries.
Can you consider adding the next branch of the wl12xx git tree to linux-next?
^ permalink raw reply
* Re: [PATCH v13 net-next] codel: Controlled Delay AQM
From: Eric Dumazet @ 2012-05-11 6:01 UTC (permalink / raw)
To: David Miller
Cc: dave.taht, nichols, van, codel, bloat, netdev, therbert, ycheng,
mattmathis, shemminger
In-Reply-To: <20120510.233632.1955738491150675180.davem@davemloft.net>
On Thu, 2012-05-10 at 23:36 -0400, David Miller wrote:
> Applied, but there was a lot of trailing whitespace and a few
> comments mis-formatted which I had to fix up.
>
> Thanks.
Arg, Stephen told me that too but I didnt have a chance to send a v14 ;)
Thanks !
^ permalink raw reply
* Re: [net-next 07/12] ixgbe: Enable timesync clock-out feature for PPS support on X540
From: Richard Cochran @ 2012-05-11 5:40 UTC (permalink / raw)
To: Keller, Jacob E
Cc: Kirsher, Jeffrey T, davem@davemloft.net, netdev@vger.kernel.org,
gospo@redhat.com, sassmann@redhat.com
In-Reply-To: <02874ECE860811409154E81DA85FBB5807794293@ORSMSX105.amr.corp.intel.com>
On Thu, May 10, 2012 at 10:08:44PM +0000, Keller, Jacob E wrote:
>
> Oops stupid mail program sent that on accident. Anyways: I think you might be
> right, Richard. We don't read those timestamp values unless the stat err bit
> for timestamps is set on the descriptor. But I am not sure what happens when the
> tjmestamped packet is dropped off the end of the ring. What would you propose
> here? How can we detect if this timestamp doesn't match the packet? I can look
> into using the extra hardware features for matching timestamps. That might be
> a more useful, in that it would help prevent this case.
[ Talking about the Rx time stamping locking from other patch... ]
The IGB provides some PTP event packet identification fields (seqNum,
etc) just for the purpose of matching time stamps to packets. Some of
the other PHC drivers (ixp4xx, dp83640) have code that does the
matching.
HTH,
Richard
^ permalink raw reply
* Re: [net-next 07/12] ixgbe: Enable timesync clock-out feature for PPS support on X540
From: Richard Cochran @ 2012-05-11 5:36 UTC (permalink / raw)
To: Keller, Jacob E
Cc: Kirsher, Jeffrey T, davem@davemloft.net, netdev@vger.kernel.org,
gospo@redhat.com, sassmann@redhat.com
In-Reply-To: <02874ECE860811409154E81DA85FBB5807794253@ORSMSX105.amr.corp.intel.com>
On Thu, May 10, 2012 at 09:54:52PM +0000, Keller, Jacob E wrote:
> > Since this function is called in every interrupt, I would check this flag
> > first thing.
> >
> Not sure what you mean? Check this before checking the other interrupts?
> I can do that.
I only meant that, assuming that the other interrupt sources are much
more frequent than the PPS, the normal case will be that the PPS flag
is not set.
It would be more efficient (that is, shorter ISR code path in normal
case) to check the flag first, perhaps like this.
ixgbe_intr()
{
...
#ifdef CONFIG_IXGBE_PTP
if (eicr & IXGBE_EICR_TIMESYNC)
ixgbe_ptp_check_pps_event(adapter, eicr);
#endif
...
}
Thanks,
Richard
^ permalink raw reply
* Re: [net-next 06/12] ixgbe: Hardware Timestamping + PTP Hardware Clock (PHC)
From: Richard Cochran @ 2012-05-11 5:15 UTC (permalink / raw)
To: Keller, Jacob E
Cc: Kirsher, Jeffrey T, davem@davemloft.net, netdev@vger.kernel.org,
gospo@redhat.com, sassmann@redhat.com
In-Reply-To: <02874ECE860811409154E81DA85FBB580779423F@ORSMSX105.amr.corp.intel.com>
On Thu, May 10, 2012 at 09:53:18PM +0000, Keller, Jacob E wrote:
> > > + /*
> > > + * If this bit is set, then the RX registers contain the time stamp. No
> > > + * other packet will be time stamped until we read these registers, so
> > > + * read the registers to make them available again. Because only one
> > > + * packet can be time stamped at a time, we know that the register
> > > + * values must belong to this one here and therefore we don't need to
> > > + * compare any of the additional attributes stored for it.
> >
> > I suspect that this assumption is wrong. What happens if the time stamping
> > logic locks a value but the packet is lost because the ring is full?
> >
> > BTW, the IGB driver also has this defect.
> >
>
> Note how I read the rx registers first? So it will always clear the value.
> That should unlock the value for the next rx stamp packet.
1. Hw recognizes ptp event packet, locks time stamp
2. Hw drops packet because queue is full
3. No more time stamps are ever generated
Can this happen? The docs seems to say it can.
Richard
^ permalink raw reply
* Re: [PATCH 01/12] netvm: Prevent a stream-specific deadlock
From: David Miller @ 2012-05-11 5:10 UTC (permalink / raw)
To: mgorman
Cc: akpm, linux-mm, netdev, linux-nfs, linux-kernel, Trond.Myklebust,
neilb, hch, a.p.zijlstra, michaelc, emunson
In-Reply-To: <1336658065-24851-2-git-send-email-mgorman@suse.de>
From: Mel Gorman <mgorman@suse.de>
Date: Thu, 10 May 2012 14:54:14 +0100
> It could happen that all !SOCK_MEMALLOC sockets have buffered so
> much data that we're over the global rmem limit. This will prevent
> SOCK_MEMALLOC buffers from receiving data, which will prevent userspace
> from running, which is needed to reduce the buffered data.
>
> Fix this by exempting the SOCK_MEMALLOC sockets from the rmem limit.
>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Signed-off-by: Mel Gorman <mgorman@suse.de>
This introduces an invariant which I am not so sure is enforced.
With this change it is absolutely required that once a socket
becomes SOCK_MEMALLOC it must never _ever_ lose that attribute.
Otherwise we can end up liberating global rmem tokens which we never
actually took.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* Re: [PATCH] wlcore: wlcore should depend on MAC80211
From: Luciano Coelho @ 2012-05-11 5:06 UTC (permalink / raw)
To: Sasha Levin
Cc: linville-2XuSBdqkA4R54TAoqtyWWQ,
linux-wireless-u79uwXL29TY76Z2rM5mHXA,
netdev-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1336711194-26883-1-git-send-email-levinsasha928-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
On Fri, 2012-05-11 at 06:39 +0200, Sasha Levin wrote:
> wlcore can't be built if MAC80211 isn't set.
>
> Signed-off-by: Sasha Levin <levinsasha928-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> ---
Thanks for your patch, Sasha! But this change was already submitted and
has been applied in the wl12xx.git tree. I've sent a pull request with
it already, but it hasn't been pulled into wireless-next and up.
http://www.spinics.net/lists/linux-next/msg19910.html
http://www.spinics.net/lists/linux-wireless/msg89583.html
--
Cheers,
Luca.
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH 00/17] Swap-over-NBD without deadlocking V10
From: David Miller @ 2012-05-11 5:04 UTC (permalink / raw)
To: mgorman
Cc: akpm, linux-mm, netdev, linux-kernel, neilb, a.p.zijlstra,
michaelc, emunson
In-Reply-To: <1336657510-24378-1-git-send-email-mgorman@suse.de>
Ok, I'm generally happy with the networking parts.
If you address my feedback I'll sign off on it.
The next question is whose tree this stuff goes through :-)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* Re: [PATCH 13/17] netvm: Set PF_MEMALLOC as appropriate during SKB processing
From: David Miller @ 2012-05-11 5:03 UTC (permalink / raw)
To: mgorman
Cc: akpm, linux-mm, netdev, linux-kernel, neilb, a.p.zijlstra,
michaelc, emunson
In-Reply-To: <1336657510-24378-14-git-send-email-mgorman@suse.de>
From: Mel Gorman <mgorman@suse.de>
Date: Thu, 10 May 2012 14:45:06 +0100
> In order to make sure pfmemalloc packets receive all memory
> needed to proceed, ensure processing of pfmemalloc SKBs happens
> under PF_MEMALLOC. This is limited to a subset of protocols that
> are expected to be used for writing to swap. Taps are not allowed to
> use PF_MEMALLOC as these are expected to communicate with userspace
> processes which could be paged out.
>
> [a.p.zijlstra@chello.nl: Ideas taken from various patches]
> [jslaby@suse.cz: Lock imbalance fix]
> Signed-off-by: Mel Gorman <mgorman@suse.de>
This adds more code where we're modifying task->flags from software
interrupt context. I'm not convinced that's safe.
Also, this starts to add new tests in the fast paths.
Most of the time they are not going to trigger at all.
Please use the static branch I asked you to add in a previous
patch to mitigate this.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* Re: [PATCH 12/17] netvm: Propagate page->pfmemalloc from netdev_alloc_page to skb
From: David Miller @ 2012-05-11 5:01 UTC (permalink / raw)
To: mgorman
Cc: akpm, linux-mm, netdev, linux-kernel, neilb, a.p.zijlstra,
michaelc, emunson
In-Reply-To: <1336657510-24378-13-git-send-email-mgorman@suse.de>
From: Mel Gorman <mgorman@suse.de>
Date: Thu, 10 May 2012 14:45:05 +0100
> +/**
> + * propagate_pfmemalloc_skb - Propagate pfmemalloc if skb is allocated after RX page
> + * @page: The page that was allocated from netdev_alloc_page
> + * @skb: The skb that may need pfmemalloc set
> + */
> +static inline void propagate_pfmemalloc_skb(struct page *page,
> + struct sk_buff *skb)
Please use consistent prefixes in the names for new interfaces.
This one should probably be named "skb_propagate_pfmemalloc()" and
go into skbuff.h since it needs no knowledge of netdevices.
In fact all of these routines are about propagation of attributes
into SKBs, irregardless of any netdevice details.
Therefore they should probably all be named skb_*() and go into
skbuff.h
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* Re: [PATCH 10/17] netvm: Allow skb allocation to use PFMEMALLOC reserves
From: David Miller @ 2012-05-11 4:57 UTC (permalink / raw)
To: mgorman
Cc: akpm, linux-mm, netdev, linux-kernel, neilb, a.p.zijlstra,
michaelc, emunson
In-Reply-To: <1336657510-24378-11-git-send-email-mgorman@suse.de>
From: Mel Gorman <mgorman@suse.de>
Date: Thu, 10 May 2012 14:45:03 +0100
> +/* Returns true if the gfp_mask allows use of ALLOC_NO_WATERMARK */
> +bool gfp_pfmemalloc_allowed(gfp_t gfp_mask);
I know this gets added in an earlier patch, but it seems slightly
overkill to have a function call just for a simply bit test.
> +extern atomic_t memalloc_socks;
> +static inline int sk_memalloc_socks(void)
> +{
> + return atomic_read(&memalloc_socks);
> +}
Please change this to be a static branch.
> + skb = __alloc_skb(length + NET_SKB_PAD, gfp_mask,
> + SKB_ALLOC_RX, NUMA_NO_NODE);
Please fix the argument indentation.
> + data = kmalloc_reserve(size + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)),
> + gfp_mask, NUMA_NO_NODE, NULL);
Likewise.
> - struct sk_buff *n = alloc_skb(newheadroom + skb->len + newtailroom,
> - gfp_mask);
> + struct sk_buff *n = __alloc_skb(newheadroom + skb->len + newtailroom,
> + gfp_mask, skb_alloc_rx_flag(skb),
> + NUMA_NO_NODE);
Likewise.
> - nskb = alloc_skb(hsize + doffset + headroom,
> - GFP_ATOMIC);
> + nskb = __alloc_skb(hsize + doffset + headroom,
> + GFP_ATOMIC, skb_alloc_rx_flag(skb),
> + NUMA_NO_NODE);
Likewise.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* Re: [PATCH 09/17] netvm: Allow the use of __GFP_MEMALLOC by specific sockets
From: David Miller @ 2012-05-11 4:50 UTC (permalink / raw)
To: mgorman
Cc: akpm, linux-mm, netdev, linux-kernel, neilb, a.p.zijlstra,
michaelc, emunson
In-Reply-To: <1336657510-24378-10-git-send-email-mgorman@suse.de>
From: Mel Gorman <mgorman@suse.de>
Date: Thu, 10 May 2012 14:45:02 +0100
> Allow specific sockets to be tagged SOCK_MEMALLOC and use
> __GFP_MEMALLOC for their allocations. These sockets will be able to go
> below watermarks and allocate from the emergency reserve. Such sockets
> are to be used to service the VM (iow. to swap over). They must be
> handled kernel side, exposing such a socket to user-space is a bug.
>
> There is a risk that the reserves be depleted so for now, the
> administrator is responsible for increasing min_free_kbytes as
> necessary to prevent deadlock for their workloads.
>
> [a.p.zijlstra@chello.nl: Original patches]
> Signed-off-by: Mel Gorman <mgorman@suse.de>
After sk_allocation() is adjusted to be sk_gfp_atomic() as I suggested
in my feedback for patch #8, this is fine.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* Re: [PATCH 08/17] net: Introduce sk_allocation() to allow addition of GFP flags depending on the individual socket
From: David Miller @ 2012-05-11 4:49 UTC (permalink / raw)
To: mgorman
Cc: akpm, linux-mm, netdev, linux-kernel, neilb, a.p.zijlstra,
michaelc, emunson
In-Reply-To: <1336657510-24378-9-git-send-email-mgorman@suse.de>
From: Mel Gorman <mgorman@suse.de>
Date: Thu, 10 May 2012 14:45:01 +0100
> Introduce sk_allocation(), this function allows to inject sock specific
> flags to each sock related allocation. It is only used on allocation
> paths that may be required for writing pages back to network storage.
>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Signed-off-by: Mel Gorman <mgorman@suse.de>
This is still a little bit more than it needs to be.
You are trying to propagate a single bit from sk->sk_allocation into
all of the annotated socket memory allocation sites.
But many of them use sk->sk_allocation already. In fact all of them
that use a variable rather than a constant GFP_* satisfy this
invariant.
All of those annotations are therefore spurious, and probably end up
generating unnecessary |'s in of that special bit in at least some
cases.
What you really, therefore, care about are the GFP_FOO cases. And in
fact those are all GFP_ATOMIC. So make something that says what it
is that you want, a GFP_ATOMIC with some socket specified bits |'d
in.
Something like this:
static inline gfp_t sk_gfp_atomic(struct sock *sk)
{
return GFP_ATOMIC | (sk->sk_allocation & __GFP_MEMALLOC);
}
You'll also have to make your networking patches conform to the
networking subsystem coding style.
For example:
> - skb = sock_wmalloc(sk, MAX_TCP_HEADER + 15 + s_data_desired, 1, GFP_ATOMIC);
> + skb = sock_wmalloc(sk, MAX_TCP_HEADER + 15 + s_data_desired, 1,
> + sk_allocation(sk, GFP_ATOMIC));
The sk_allocation() argument has to line up with the first column
after the openning parenthesis of the function call. You can't just
use all TAB characters. And this all TABs thing looks extremely ugly
to boot.
> - newnp->pktoptions = skb_clone(treq->pktopts, GFP_ATOMIC);
> + newnp->pktoptions = skb_clone(treq->pktopts,
> + sk_allocation(sk, GFP_ATOMIC));
Same here.
What's really funny to me is that in several cases elsewhere in this
pach you get it right.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* [PATCH] wlcore: wlcore should depend on MAC80211
From: Sasha Levin @ 2012-05-11 4:39 UTC (permalink / raw)
To: coelho, linville; +Cc: linux-wireless, netdev, linux-kernel, Sasha Levin
wlcore can't be built if MAC80211 isn't set.
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
---
drivers/net/wireless/ti/wlcore/Kconfig | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/drivers/net/wireless/ti/wlcore/Kconfig b/drivers/net/wireless/ti/wlcore/Kconfig
index 9d04c38..54156b0 100644
--- a/drivers/net/wireless/ti/wlcore/Kconfig
+++ b/drivers/net/wireless/ti/wlcore/Kconfig
@@ -1,6 +1,6 @@
config WLCORE
tristate "TI wlcore support"
- depends on WL_TI && GENERIC_HARDIRQS
+ depends on WL_TI && GENERIC_HARDIRQS && MAC80211
depends on INET
select FW_LOADER
---help---
--
1.7.8.5
^ permalink raw reply related
* Re: [PATCH 05/17] mm: allow PF_MEMALLOC from softirq context
From: David Miller @ 2012-05-11 4:39 UTC (permalink / raw)
To: mgorman
Cc: akpm, linux-mm, netdev, linux-kernel, neilb, a.p.zijlstra,
michaelc, emunson
In-Reply-To: <1336657510-24378-6-git-send-email-mgorman@suse.de>
From: Mel Gorman <mgorman@suse.de>
Date: Thu, 10 May 2012 14:44:58 +0100
> This is needed to allow network softirq packet processing to make
> use of PF_MEMALLOC.
>
> Currently softirq context cannot use PF_MEMALLOC due to it not being
> associated with a task, and therefore not having task flags to fiddle
> with - thus the gfp to alloc flag mapping ignores the task flags when
> in interrupts (hard or soft) context.
>
> Allowing softirqs to make use of PF_MEMALLOC therefore requires some
> trickery. We basically borrow the task flags from whatever process
> happens to be preempted by the softirq.
>
> So we modify the gfp to alloc flags mapping to not exclude task flags
> in softirq context, and modify the softirq code to save, clear and
> restore the PF_MEMALLOC flag.
>
> The save and clear, ensures the preempted task's PF_MEMALLOC flag
> doesn't leak into the softirq. The restore ensures a softirq's
> PF_MEMALLOC flag cannot leak back into the preempted process.
>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Signed-off-by: Mel Gorman <mgorman@suse.de>
We're now making changes to task->flags from both base and
softirq context, but with non-atomic operations and no other
kind of synchronization.
As far as I can tell, this has to be racy.
If this works via some magic combination of invariants, you
absolutely have to document this, verbosely.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* Re: [PATCH net-next] 6lowpan: IPv6 link local address
From: David Miller @ 2012-05-11 3:38 UTC (permalink / raw)
To: alex.bluesman.smirnov; +Cc: netdev
In-Reply-To: <1336656352-19518-1-git-send-email-alex.bluesman.smirnov@gmail.com>
From: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Date: Thu, 10 May 2012 17:25:52 +0400
> According to the RFC4944 (Transmission of IPv6 Packets over
> IEEE 802.15.4 Networks), chapter 7:
>
> The IPv6 link-local address [RFC4291] for an IEEE 802.15.4 interface
> is formed by appending the Interface Identifier, as defined above, to
> the prefix FE80::/64.
>
> 10 bits 54 bits 64 bits
> +----------+-----------------------+----------------------------+
> |1111111010| (zeros) | Interface Identifier |
> +----------+-----------------------+----------------------------+
>
> This patch adds IPv6 address generation support for the 6lowpan
> interfaces.
>
> Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Applied.
^ permalink raw reply
* Re: [PATCH v13 net-next] codel: Controlled Delay AQM
From: David Miller @ 2012-05-11 3:36 UTC (permalink / raw)
To: eric.dumazet
Cc: dave.taht, nichols, van, codel, bloat, netdev, therbert, ycheng,
mattmathis, shemminger
In-Reply-To: <1336672285.31653.102.camel@edumazet-glaptop>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 10 May 2012 19:51:25 +0200
> From: Eric Dumazet <edumazet@google.com>
>
> An implementation of CoDel AQM, from Kathleen Nichols and Van Jacobson.
...
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Dave Taht <dave.taht@bufferbloat.net>
Applied, but there was a lot of trailing whitespace and a few
comments mis-formatted which I had to fix up.
Thanks.
^ permalink raw reply
* Re: [PATCH net-next] net_sched: update bstats in dequeue()
From: David Miller @ 2012-05-11 3:36 UTC (permalink / raw)
To: eric.dumazet; +Cc: netdev
In-Reply-To: <1336664194.31653.8.camel@edumazet-glaptop>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 10 May 2012 17:36:34 +0200
> From: Eric Dumazet <edumazet@google.com>
>
> Class bytes/packets stats can be misleading because they are updated in
> enqueue() while packet might be dropped later.
>
> We already fixed all qdiscs but sch_atm.
>
> This patch makes the final cleanup.
>
> class rate estimators can now match qdisc ones.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Applied.
^ permalink raw reply
* Re: [PATCH 3/3] net,drivers/net: Convert compare_ether_addr_64bits to ether_addr_equal_64bits
From: David Miller @ 2012-05-11 3:35 UTC (permalink / raw)
To: joe; +Cc: fubar, andy, kaber, netdev, linux-kernel
In-Reply-To: <acb363688c8ed729cc61e288605cdb4aa4089149.1336618708.git.joe@perches.com>
From: Joe Perches <joe@perches.com>
Date: Wed, 9 May 2012 20:04:04 -0700
> Use the new bool function ether_addr_equal_64bits to add
> some clarity and reduce the likelihood for misuse of
> compare_ether_addr_64bits for sorting.
>
> Done via cocci script:
...
> Signed-off-by: Joe Perches <joe@perches.com>
Applied.
^ permalink raw reply
* Re: [PATCH 2/3] etherdevice.h: Add ether_addr_equal_64bits
From: David Miller @ 2012-05-11 3:35 UTC (permalink / raw)
To: joe; +Cc: linux-kernel, netdev
In-Reply-To: <7535e70ceb63f7da56f73cb37d4b269cd7798583.1336618708.git.joe@perches.com>
From: Joe Perches <joe@perches.com>
Date: Wed, 9 May 2012 20:04:03 -0700
> Add an optimized boolean function to check if
> 2 ethernet addresses are the same.
>
> This is to avoid any confusion about compare_ether_addr_64bits
> returning an unsigned, and not being able to use the
> compare_ether_addr_64bits function for sorting ala memcmp.
>
> Signed-off-by: Joe Perches <joe@perches.com>
Applied.
^ permalink raw reply
* Re: [PATCH 1/3] drivers/net: Convert compare_ether_addr to ether_addr_equal
From: David Miller @ 2012-05-11 3:35 UTC (permalink / raw)
To: joe; +Cc: linux-kernel, netdev, linux-wireless, ath5k-devel, ath9k-devel
In-Reply-To: <7c9881a67c52c2f218480b6742155b6d6928122d.1336618708.git.joe@perches.com>
From: Joe Perches <joe@perches.com>
Date: Wed, 9 May 2012 20:17:46 -0700
> Use the new bool function ether_addr_equal to add
> some clarity and reduce the likelihood for misuse
> of compare_ether_addr for sorting.
>
> Done via cocci script:
...
> Signed-off-by: Joe Perches <joe@perches.com>
Applied.
^ permalink raw reply
* Re: [PATCH net-next] be2net: avoid disabling sriov while VFs are assigned
From: David Miller @ 2012-05-11 3:35 UTC (permalink / raw)
To: sathya.perla; +Cc: netdev
In-Reply-To: <647fcd25-8a85-4d63-a1ef-8b867832c2c7@exht1.ad.emulex.com>
From: Sathya Perla <sathya.perla@emulex.com>
Date: Wed, 9 May 2012 11:11:24 +0530
> Calling pci_disable_sriov() while VFs are assigned to VMs causes
> kernel panic. This patch uses PCI_DEV_FLAGS_ASSIGNED bit state of the
> VF's pci_dev to avoid this. Also, the unconditional function reset cmd
> issued on a PF probe can delete the VF configuration for the
> previously enabled VFs. A scratchpad register is now used to issue a
> function reset only when needed (i.e., in a crash dump scenario.)
>
> Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Applied.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox