* [IPROUTE2 0/2] Add ECN support to tc-netem
From: Vijay Subramanian @ 2012-05-16 23:51 UTC (permalink / raw)
To: netdev; +Cc: Eric Dumazet, Stephen Hemminger, Vijay Subramanian
Recent patch to net-next kernel from Eric Dumazet (e4ae004b84b netem: add ECN
capability) made it possible for netem to mark packets with ECN instead of
dropping them. These two patches add support to iproute2/tc and update the
manpage.
Vijay Subramanian (2):
Update tc-netem manpage to add ecn capability
tc-netem: Add support for ECN packet marking
include/linux/pkt_sched.h | 1 +
man/man8/tc-netem.8 | 8 ++++++--
tc/q_netem.c | 25 +++++++++++++++++++++++++
3 files changed, 32 insertions(+), 2 deletions(-)
^ permalink raw reply
* [IPROUTE2 1/2] Update tc-netem manpage to add ecn capability
From: Vijay Subramanian @ 2012-05-16 23:51 UTC (permalink / raw)
To: netdev; +Cc: Eric Dumazet, Stephen Hemminger, Vijay Subramanian
In-Reply-To: <1337212318-2100-1-git-send-email-subramanian.vijay@gmail.com>
This patch updates the netem manpage to describe how to use
netem to mark packets with ecn instead of dropping them.
Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
---
man/man8/tc-netem.8 | 8 ++++++--
1 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/man/man8/tc-netem.8 b/man/man8/tc-netem.8
index 39f8454..b0b7864 100644
--- a/man/man8/tc-netem.8
+++ b/man/man8/tc-netem.8
@@ -30,8 +30,8 @@ NetEm \- Network Emulator
.IR p13 " [ " p31 " [ " p32 " [ " p23 " [ " p14 "]]]] |"
.br
.RB " " gemodel
-.IR p " [ " r " [ " 1-h " [ " 1-k " ]]]"
-.BR " }"
+.IR p " [ " r " [ " 1-h " [ " 1-k " ]]] } "
+.RB " [ " ecn " ] "
.IR CORRUPT " := "
.B corrupt
@@ -102,6 +102,10 @@ model. As known, p and r are the transition probabilities between the bad and
the good states, 1-h is the loss probability in the bad state and 1-k is the
loss probability in the good state.
+.SS ecn
+can be used optionally to mark packets instead of dropping them. A loss model
+has to be used for this to be enabled.
+
.SS corrupt
allows the emulation of random noise introducing an error in a random position
for a chosen percent of packets. It is also possible to add a correlation
--
1.7.0.4
^ permalink raw reply related
* [IPROUTE2 2/2] tc-netem: Add support for ECN packet marking
From: Vijay Subramanian @ 2012-05-16 23:51 UTC (permalink / raw)
To: netdev; +Cc: Eric Dumazet, Stephen Hemminger, Vijay Subramanian
In-Reply-To: <1337212318-2100-1-git-send-email-subramanian.vijay@gmail.com>
This patch provides support for marking packets with ECN instead of
dropping them with netem. This makes it possible to make use of the
netem ECN marking feature that was added recently to the kernel.
Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
---
include/linux/pkt_sched.h | 1 +
tc/q_netem.c | 26 ++++++++++++++++++++++++++
2 files changed, 27 insertions(+), 0 deletions(-)
diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h
index 410b33d..ffe975c 100644
--- a/include/linux/pkt_sched.h
+++ b/include/linux/pkt_sched.h
@@ -509,6 +509,7 @@ enum {
TCA_NETEM_CORRUPT,
TCA_NETEM_LOSS,
TCA_NETEM_RATE,
+ TCA_NETEM_ECN,
__TCA_NETEM_MAX,
};
diff --git a/tc/q_netem.c b/tc/q_netem.c
index 360080c..f8489e9 100644
--- a/tc/q_netem.c
+++ b/tc/q_netem.c
@@ -38,6 +38,7 @@ static void explain(void)
" [ loss random PERCENT [CORRELATION]]\n" \
" [ loss state P13 [P31 [P32 [P23 P14]]]\n" \
" [ loss gemodel PERCENT [R [1-H [1-K]]]\n" \
+" [ ecn ]\n" \
" [ reorder PRECENT [CORRELATION] [ gap DISTANCE ]]\n" \
" [ rate RATE [PACKETOVERHEAD] [CELLSIZE] [CELLOVERHEAD]]\n");
}
@@ -326,6 +327,8 @@ static int netem_parse_opt(struct qdisc_util *qu, int argc, char **argv,
*argv);
return -1;
}
+ } else if (matches(*argv, "ecn") == 0) {
+ present[TCA_NETEM_ECN] = 1;
} else if (matches(*argv, "reorder") == 0) {
NEXT_ARG();
present[TCA_NETEM_REORDER] = 1;
@@ -437,6 +440,14 @@ static int netem_parse_opt(struct qdisc_util *qu, int argc, char **argv,
return -1;
}
+ if (present[TCA_NETEM_ECN]) {
+ if (opt.loss <= 0 && loss_type == NETEM_LOSS_UNSPEC) {
+ fprintf(stderr, "ecn requested without loss model\n");
+ explain();
+ return -1;
+ }
+ }
+
if (dist_data && (opt.latency == 0 || opt.jitter == 0)) {
fprintf(stderr, "distribution specified but no latency and jitter values\n");
explain();
@@ -454,6 +465,11 @@ static int netem_parse_opt(struct qdisc_util *qu, int argc, char **argv,
addattr_l(n, 1024, TCA_NETEM_REORDER, &reorder, sizeof(reorder)) < 0)
return -1;
+ if (present[TCA_NETEM_ECN] &&
+ addattr_l(n, 1024, TCA_NETEM_ECN, &present[TCA_NETEM_ECN],
+ sizeof(present[TCA_NETEM_ECN])) < 0)
+ return -1;
+
if (present[TCA_NETEM_CORRUPT] &&
addattr_l(n, 1024, TCA_NETEM_CORRUPT, &corrupt, sizeof(corrupt)) < 0)
return -1;
@@ -500,6 +516,7 @@ static int netem_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt)
const struct tc_netem_corrupt *corrupt = NULL;
const struct tc_netem_gimodel *gimodel = NULL;
const struct tc_netem_gemodel *gemodel = NULL;
+ int *ecn = NULL;
struct tc_netem_qopt qopt;
const struct tc_netem_rate *rate = NULL;
int len = RTA_PAYLOAD(opt) - sizeof(qopt);
@@ -548,6 +565,11 @@ static int netem_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt)
return -1;
rate = RTA_DATA(tb[TCA_NETEM_RATE]);
}
+ if (tb[TCA_NETEM_ECN]) {
+ if (RTA_PAYLOAD(tb[TCA_NETEM_ECN]) < sizeof(*ecn))
+ return -1;
+ ecn = RTA_DATA(tb[TCA_NETEM_ECN]);
+ }
}
fprintf(f, "limit %d", qopt.limit);
@@ -617,9 +639,13 @@ static int netem_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt)
fprintf(f, " celloverhead %d", rate->cell_overhead);
}
+ if (ecn)
+ fprintf(f, " ecn ");
+
if (qopt.gap)
fprintf(f, " gap %lu", (unsigned long)qopt.gap);
+
return 0;
}
--
1.7.0.4
^ permalink raw reply related
* Re: [PATCH 0/7] netfilter updates for net-next (batch 3)
From: David Miller @ 2012-05-17 0:00 UTC (permalink / raw)
To: pablo; +Cc: netfilter-devel, netdev
In-Reply-To: <1337209604-3412-1-git-send-email-pablo@netfilter.org>
From: pablo@netfilter.org
Date: Thu, 17 May 2012 01:06:37 +0200
> The following patchset contains small updates for net-next, more relevantly:
>
> * One fix for potential NULL dereference in xt_HMARK by Dan Carpenter.
>
> * Conversion to use _ALL macro in xt_hashlimit as you suggested by
> Florian Westphal.
>
> * One fix for timeout overflow from Jozsef Kadlecsik.
>
> * Replace usage of modulus for hash calculation in xt_HMARK as you suggested
> from myself.
>
> You can pull these changes from:
>
> git://1984.lsi.us.es/net-next master
Pulled, thanks a lot!
^ permalink raw reply
* Re: [PATCH v5 2/2] decrement static keys on real destroy time
From: KAMEZAWA Hiroyuki @ 2012-05-17 0:07 UTC (permalink / raw)
To: Andrew Morton
Cc: Glauber Costa, cgroups-u79uwXL29TY76Z2rM5mHXA,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg, devel-GEFAQzZX7r8dnm+yROfE0A,
netdev-u79uwXL29TY76Z2rM5mHXA, Tejun Heo, Li Zefan,
Johannes Weiner, Michal Hocko
In-Reply-To: <20120516141342.911931e7.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
(2012/05/17 6:13), Andrew Morton wrote:
> On Fri, 11 May 2012 17:11:17 -0300
> Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> wrote:
>
>> We call the destroy function when a cgroup starts to be removed,
>> such as by a rmdir event.
>>
>> However, because of our reference counters, some objects are still
>> inflight. Right now, we are decrementing the static_keys at destroy()
>> time, meaning that if we get rid of the last static_key reference,
>> some objects will still have charges, but the code to properly
>> uncharge them won't be run.
>>
>> This becomes a problem specially if it is ever enabled again, because
>> now new charges will be added to the staled charges making keeping
>> it pretty much impossible.
>>
>> We just need to be careful with the static branch activation:
>> since there is no particular preferred order of their activation,
>> we need to make sure that we only start using it after all
>> call sites are active. This is achieved by having a per-memcg
>> flag that is only updated after static_key_slow_inc() returns.
>> At this time, we are sure all sites are active.
>>
>> This is made per-memcg, not global, for a reason:
>> it also has the effect of making socket accounting more
>> consistent. The first memcg to be limited will trigger static_key()
>> activation, therefore, accounting. But all the others will then be
>> accounted no matter what. After this patch, only limited memcgs
>> will have its sockets accounted.
>
> So I'm scratching my head over what the actual bug is, and how
> important it is. AFAICT it will cause charging stats to exhibit some
> inaccuracy when memcg's are being torn down?
>
> I don't know how serious this in in the real world and so can't decide
> which kernel version(s) we should fix.
>
> When fixing bugs, please always fully describe the bug's end-user
> impact, so that I and others can make these sorts of decisions.
>
Ah, this was a bug report from me. tcp accounting can be easily broken.
Costa, could you include this ?
==
tcp memcontrol uses static_branch to optimize limit=RESOURCE_MAX case.
If all cgroup's limit=RESOUCE_MAX, resource usage is not accounted.
But it's buggy now.
For example, do following
# while sleep 1;do
echo 9223372036854775807 > /cgroup/memory/A/memory.kmem.tcp.limit_in_bytes;
echo 300M > /cgroup/memory/A/memory.kmem.tcp.limit_in_bytes;
done
and run network application under A. tcp's usage is sometimes accounted
and sometimes not accounted because of frequent changes of static_branch.
Then, you can see broken tcp.usage_in_bytes.
WARN_ON() is printed because res_counter->usage goes below 0.
==
kernel: ------------[ cut here ]----------
kernel: WARNING: at kernel/res_counter.c:96 res_counter_uncharge_locked+0x37/0x40()
<snip>
kernel: Pid: 17753, comm: bash Tainted: G W 3.3.0+ #99
kernel: Call Trace:
kernel: <IRQ> [<ffffffff8104cc9f>] warn_slowpath_common+0x7f/0xc0
kernel: [<ffffffff810d7e88>] ? rb_reserve__next_event+0x68/0x470
kernel: [<ffffffff8104ccfa>] warn_slowpath_null+0x1a/0x20
kernel: [<ffffffff810b4e37>] res_counter_uncharge_locked+0x37/0x40
...
==
^ permalink raw reply
* Re: [PATCH 1/1] smsc95xx: add FLAG_POINTTOPOINT flag for driver_info
From: Xiao Jiang @ 2012-05-17 2:23 UTC (permalink / raw)
To: Ming Lei; +Cc: steve.glendinning, gregkh, netdev, linux-usb, linux-kernel
In-Reply-To: <CACVXFVNzmq74BKYZN1SpXYULneV2ASmniMhs4LhevPm-XgSJpg@mail.gmail.com>
Ming Lei wrote:
> On Wed, May 16, 2012 at 4:01 PM, <jgq516@gmail.com> wrote:
>
>> From: Xiao Jiang <jgq516@gmail.com>
>>
>> commit c26134 introduced FLAG_POINTTOPOINT flag for USB ethernet devices
>> which possibly use "usb%d" names, add this flag to make sure pandaboard
>> can mount nfs with smsc95xx NIC.
>>
>
> Without the flag, I also can mount nfs successfully on my Pandaboard...
>
>
I have pulled latest tree
(git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
commit 0e93b4b304ae052ba1bc73f6d34a68556fe93429), and enable related
options (USB_NET_SMSC95XX,
USB_EHCI_HCD and USB_EHCI_HCD_OMAP) with omap2plus_config, However the
kernel still can't mount
nfs, pls see below infos.
[ 3.114105] smsc95xx v1.0.4
[ 4.533752] smsc95xx 1-1.1:1.0: *eth0*: register 'smsc95xx' at
usb-ehci-omap.0-1.1, smsc95xx USB 2.0 Ethernet, fe:b9:1b:07:8e:d1
[ 108.854217] VFS: Unable to mount root fs via NFS, trying floppy.
[ 108.861114] VFS: Cannot open root device "nfs" or unknown-block(2,0):
error -6
[ 108.868713] Please append a correct "root=" boot option; here are the
available partitions:
[ 108.877655] b300 7761920 mmcblk0 driver: mmcblk
[ 108.883239] b301 40131 mmcblk0p1
00000000-0000-0000-0000-000000000mmcblk0p1
[ 108.891662] b302 7719232 mmcblk0p2
00000000-0000-0000-0000-000000000mmcblk0p2
[ 108.900146] Kernel panic - not syncing: VFS: Unable to mount root fs
on unknown-block(2,0)
BTW: I tested it with OMAP4430 ES2.2 pandaboard, the issue can be solved
with apply the patch.
Is there something which I missed? thanks.
Regards,
Xiao
>> Signed-off-by: Xiao Jiang <jgq516@gmail.com>
>> ---
>> drivers/net/usb/smsc95xx.c | 3 ++-
>> 1 files changed, 2 insertions(+), 1 deletions(-)
>>
>> diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c
>> index 94ae669..e158288 100644
>> --- a/drivers/net/usb/smsc95xx.c
>> +++ b/drivers/net/usb/smsc95xx.c
>> @@ -1192,7 +1192,8 @@ static const struct driver_info smsc95xx_info = {
>> .rx_fixup = smsc95xx_rx_fixup,
>> .tx_fixup = smsc95xx_tx_fixup,
>> .status = smsc95xx_status,
>> - .flags = FLAG_ETHER | FLAG_SEND_ZLP | FLAG_LINK_INTR,
>> + .flags = FLAG_ETHER | FLAG_POINTTOPOINT | FLAG_SEND_ZLP |
>> + FLAG_LINK_INTR,
>> };
>>
>> static const struct usb_device_id products[] = {
>> --
>> 1.7.3
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/
>>
>
>
> Thanks,
>
^ permalink raw reply
* Re: [V2 PATCH 9/9] vhost: zerocopy: poll vq in zerocopy callback
From: Jason Wang @ 2012-05-17 2:50 UTC (permalink / raw)
To: Shirley Ma
Cc: Michael S. Tsirkin, eric.dumazet, netdev, linux-kernel, ebiederm,
davem
In-Reply-To: <1337189525.10741.24.camel@oc3660625478.ibm.com>
On 05/17/2012 01:32 AM, Shirley Ma wrote:
> On Wed, 2012-05-16 at 18:14 +0300, Michael S. Tsirkin wrote:
>> On Wed, May 16, 2012 at 08:10:27AM -0700, Shirley Ma wrote:
>>> On Wed, 2012-05-16 at 10:58 +0800, Jason Wang wrote:
>>>>>> drivers/vhost/vhost.c | 1 +
>>>>>> 1 files changed, 1 insertions(+), 0 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>>>>>> index 947f00d..7b75fdf 100644
>>>>>> --- a/drivers/vhost/vhost.c
>>>>>> +++ b/drivers/vhost/vhost.c
>>>>>> @@ -1604,6 +1604,7 @@ void vhost_zerocopy_callback(void *arg)
>>>>>> struct vhost_ubuf_ref *ubufs = ubuf->arg;
>>>>>> struct vhost_virtqueue *vq = ubufs->vq;
>>>>>>
>>>>>> + vhost_poll_queue(&vq->poll);
>>>>>> /* set len = 1 to mark this desc buffers done DMA */
>>>>>> vq->heads[ubuf->desc].len = VHOST_DMA_DONE_LEN;
>>>>>> kref_put(&ubufs->kref, vhost_zerocopy_done_signal);
>>>>> Doing so, we might have redundant vhost_poll_queue(). Do you
>> know in
>>>>> which scenario there might be missing of adding and signaling
>> during
>>>>> zerocopy?
>>>> Yes, as we only do signaling and adding during tx work, if there's
>> no
>>>> tx
>>>> work when the skb were sent, we may lose the opportunity to let
>> guest
>>>> know about the completion. It's easy to be reproduced with netperf
>>>> test.
>>> The reason which host signals guest is to free guest tx buffers, if
>>> there is no tx work, then it's not necessary to signal the guest
>> unless
>>> guest runs out of memory. The pending buffers will be released
>>> virtio_net device gone.
Looks like we only free the skbs in .ndo_start_xmit().
>>>
>>> What's the behavior of netperf test when you hit this situation?
>>>
>>> Thanks
>>> Shirley
>> IIRC guest networking seems to be lost.
> It seems vhost_enable_notify is missing in somewhere else?
>
> Thanks
> Shirley
>
The problem is we may stop the tx queue when there no enough capacity to
place packets, at this moment we depends on the tx interrupt to
re-enable the tx queue. So if we didn't poll the vhost during callback,
guest may lose the tx interrupt to re-enable the tx queue which could
stall the whole tx queue.
Thanks
^ permalink raw reply
* Re: [V2 PATCH 2/9] macvtap: zerocopy: fix truesize underestimation
From: Jason Wang @ 2012-05-17 2:59 UTC (permalink / raw)
To: Shirley Ma; +Cc: eric.dumazet, mst, netdev, linux-kernel, ebiederm, davem
In-Reply-To: <1337180585.10741.6.camel@oc3660625478.ibm.com>
On 05/16/2012 11:03 PM, Shirley Ma wrote:
> On Wed, 2012-05-16 at 11:04 +0800, Jason Wang wrote:
>>>> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
>>>> index bd4a70d..7cb2684 100644
>>>> --- a/drivers/net/macvtap.c
>>>> +++ b/drivers/net/macvtap.c
>>>> @@ -519,6 +519,7 @@ static int zerocopy_sg_from_iovec(struct
>> sk_buff
>>>> *skb, const struct iovec *from,
>>>> struct page *page[MAX_SKB_FRAGS];
>>>> int num_pages;
>>>> unsigned long base;
>>>> + unsigned long truesize;
>>>>
>>>> len = from->iov_len - offset;
>>>> if (!len) {
>>>> @@ -533,10 +534,11 @@ static int zerocopy_sg_from_iovec(struct
>> sk_buff
>>>> *skb, const struct iovec *from,
>>>> (num_pages> MAX_SKB_FRAGS -
>>>> skb_shinfo(skb)->nr_frags))
>>>> /* put_page is in skb free */
>>>> return -EFAULT;
>>>> + truesize = size * PAGE_SIZE;
>>> Here should be truesize = size * PAGE_SIZE - offset, right?
>>>
>> We get the whole user page, so need to account them all. Also this is
>> aligned with skb_copy_ubufs().
> Then this would double count the size of "first" offset left from
> previous copy, both skb->len and truesize.
>
> Thanks
> Shirley
>
Didn't see how this affact skb->len. And for truesize, I think they are
different, when the offset were not zero, the data in this vector were
divided into two parts. First part is copied into skb directly, and the
second were pinned from a whole userspace page by get_user_pages_fast(),
so we need count the whole page to the socket limit to prevent evil
application.
Thanks
^ permalink raw reply
* Re: linux-next: manual merge of the net-next tree with the sparc-next tree
From: Stephen Rothwell @ 2012-05-17 3:04 UTC (permalink / raw)
To: Sam Ravnborg; +Cc: David Miller, netdev, linux-next, linux-kernel
In-Reply-To: <20120516050245.GA407@merkur.ravnborg.org>
[-- Attachment #1: Type: text/plain, Size: 1859 bytes --]
Hi Sam,
On Wed, 16 May 2012 07:02:45 +0200 Sam Ravnborg <sam@ravnborg.org> wrote:
>
> On Wed, May 16, 2012 at 02:39:44PM +1000, Stephen Rothwell wrote:
> > Hi all,
> >
> > Today's linux-next merge of the net-next tree got a conflict in
> > arch/sparc/Makefile between commit e1d7de8377e6 ("sparc: introduce
> > arch/sparc/Kbuild") from the sparc-next tree and commit 2809a2087cc4
> > ("net: filter: Just In Time compiler for sparc") from the net-next tree.
> >
> > I suspect that the core-y net bit below should be changed to be a obj-y
> > bit of arch/sparc/Kbuild ...
>
> Correct - like this:
>
> arch/sparc/Kbuild:
>
> obj-y += net/
So I applied this merge fixup to the merge of the net-next tree today
(and can carry it as necessary):
From: Stephen Rothwell <sfr@canb.auug.org.au>
Date: Thu, 17 May 2012 13:00:07 +1000
Subject: [PATCH] net: arch/sparc/Makefile merge fixup
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
---
arch/sparc/Kbuild | 1 +
arch/sparc/Makefile | 1 -
2 files changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/sparc/Kbuild b/arch/sparc/Kbuild
index 27b540d..5cd0116 100644
--- a/arch/sparc/Kbuild
+++ b/arch/sparc/Kbuild
@@ -5,3 +5,4 @@
obj-y += kernel/
obj-y += mm/
obj-y += math-emu/
+obj-y += net/
diff --git a/arch/sparc/Makefile b/arch/sparc/Makefile
index 554e38f..b9a72e2 100644
--- a/arch/sparc/Makefile
+++ b/arch/sparc/Makefile
@@ -54,7 +54,6 @@ head-y += arch/sparc/kernel/init_task.o
# See arch/sparc/Kbuild for the core part of the kernel
core-y += arch/sparc/
-core-y += arch/sparc/net/
libs-y += arch/sparc/prom/
libs-y += arch/sparc/lib/
--
1.7.10.280.gaa39
--
Cheers,
Stephen Rothwell sfr@canb.auug.org.au
[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply related
* Re: [PATCH v5 2/2] decrement static keys on real destroy time
From: Glauber Costa @ 2012-05-17 3:06 UTC (permalink / raw)
To: Andrew Morton
Cc: cgroups, linux-mm, devel, kamezawa.hiroyu, netdev, Tejun Heo,
Li Zefan, Johannes Weiner, Michal Hocko
In-Reply-To: <20120516140637.17741df6.akpm@linux-foundation.org>
On 05/17/2012 01:06 AM, Andrew Morton wrote:
> On Fri, 11 May 2012 17:11:17 -0300
> Glauber Costa<glommer@parallels.com> wrote:
>
>> We call the destroy function when a cgroup starts to be removed,
>> such as by a rmdir event.
>>
>> However, because of our reference counters, some objects are still
>> inflight. Right now, we are decrementing the static_keys at destroy()
>> time, meaning that if we get rid of the last static_key reference,
>> some objects will still have charges, but the code to properly
>> uncharge them won't be run.
>>
>> This becomes a problem specially if it is ever enabled again, because
>> now new charges will be added to the staled charges making keeping
>> it pretty much impossible.
>>
>> We just need to be careful with the static branch activation:
>> since there is no particular preferred order of their activation,
>> we need to make sure that we only start using it after all
>> call sites are active. This is achieved by having a per-memcg
>> flag that is only updated after static_key_slow_inc() returns.
>> At this time, we are sure all sites are active.
>>
>> This is made per-memcg, not global, for a reason:
>> it also has the effect of making socket accounting more
>> consistent. The first memcg to be limited will trigger static_key()
>> activation, therefore, accounting. But all the others will then be
>> accounted no matter what. After this patch, only limited memcgs
>> will have its sockets accounted.
>>
>> ...
>>
>> @@ -107,10 +104,31 @@ static int tcp_update_limit(struct mem_cgroup *memcg, u64 val)
>> tcp->tcp_prot_mem[i] = min_t(long, val>> PAGE_SHIFT,
>> net->ipv4.sysctl_tcp_mem[i]);
>>
>> - if (val == RESOURCE_MAX&& old_lim != RESOURCE_MAX)
>> - static_key_slow_dec(&memcg_socket_limit_enabled);
>> - else if (old_lim == RESOURCE_MAX&& val != RESOURCE_MAX)
>> - static_key_slow_inc(&memcg_socket_limit_enabled);
>> + if (val == RESOURCE_MAX)
>> + cg_proto->active = false;
>> + else if (val != RESOURCE_MAX) {
>> + /*
>> + * ->activated needs to be written after the static_key update.
>> + * This is what guarantees that the socket activation function
>> + * is the last one to run. See sock_update_memcg() for details,
>> + * and note that we don't mark any socket as belonging to this
>> + * memcg until that flag is up.
>> + *
>> + * We need to do this, because static_keys will span multiple
>> + * sites, but we can't control their order. If we mark a socket
>> + * as accounted, but the accounting functions are not patched in
>> + * yet, we'll lose accounting.
>> + *
>> + * We never race with the readers in sock_update_memcg(), because
>> + * when this value change, the code to process it is not patched in
>> + * yet.
>> + */
>> + if (!cg_proto->activated) {
>> + static_key_slow_inc(&memcg_socket_limit_enabled);
>> + cg_proto->activated = true;
>> + }
>
> If two threads run this code concurrently, they can both see
> cg_proto->activated==false and they will both run
> static_key_slow_inc().
>
> Hopefully there's some locking somewhere which prevents this, but it is
> unobvious. We should comment this, probably at the cg_proto.activated
> definition site. Or we should fix the bug ;)
>
If that happens, locking in static_key_slow_inc will prevent any damage.
My previous version had explicit code to prevent that, but we were
pointed out that this is already part of the static_key expectations, so
that was dropped.
^ permalink raw reply
* Re: [PATCH v5 2/2] decrement static keys on real destroy time
From: Glauber Costa @ 2012-05-17 3:09 UTC (permalink / raw)
To: Andrew Morton
Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
devel-GEFAQzZX7r8dnm+yROfE0A,
kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
netdev-u79uwXL29TY76Z2rM5mHXA, Tejun Heo, Li Zefan,
Johannes Weiner, Michal Hocko
In-Reply-To: <20120516141342.911931e7.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
On 05/17/2012 01:13 AM, Andrew Morton wrote:
> On Fri, 11 May 2012 17:11:17 -0300
> Glauber Costa<glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> wrote:
>
>> We call the destroy function when a cgroup starts to be removed,
>> such as by a rmdir event.
>>
>> However, because of our reference counters, some objects are still
>> inflight. Right now, we are decrementing the static_keys at destroy()
>> time, meaning that if we get rid of the last static_key reference,
>> some objects will still have charges, but the code to properly
>> uncharge them won't be run.
>>
>> This becomes a problem specially if it is ever enabled again, because
>> now new charges will be added to the staled charges making keeping
>> it pretty much impossible.
>>
>> We just need to be careful with the static branch activation:
>> since there is no particular preferred order of their activation,
>> we need to make sure that we only start using it after all
>> call sites are active. This is achieved by having a per-memcg
>> flag that is only updated after static_key_slow_inc() returns.
>> At this time, we are sure all sites are active.
>>
>> This is made per-memcg, not global, for a reason:
>> it also has the effect of making socket accounting more
>> consistent. The first memcg to be limited will trigger static_key()
>> activation, therefore, accounting. But all the others will then be
>> accounted no matter what. After this patch, only limited memcgs
>> will have its sockets accounted.
>
> So I'm scratching my head over what the actual bug is, and how
> important it is. AFAICT it will cause charging stats to exhibit some
> inaccuracy when memcg's are being torn down?
>
> I don't know how serious this in in the real world and so can't decide
> which kernel version(s) we should fix.
>
> When fixing bugs, please always fully describe the bug's end-user
> impact, so that I and others can make these sorts of decisions.
Hi Andrew.
I believe that was described in patch 0/2 ?
In any case, this is something we need fixed, but it is not -stable
material or anything.
The bug leads to misaccounting when we quickly enable and disable limit
in a loop. We have a synthetic script to demonstrate that.
^ permalink raw reply
* Re: [PATCH] virtio_net: invoke softirqs after __napi_schedule
From: Rusty Russell @ 2012-05-17 3:32 UTC (permalink / raw)
To: Michael S. Tsirkin, David Miller
Cc: netdev, virtualization, linux-kernel, Michael S. Tsirkin
In-Reply-To: <20120516075712.GA2921@redhat.com>
On Wed, 16 May 2012 10:57:13 +0300, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> __napi_schedule might raise softirq but nothing
> causes do_softirq to trigger, so it does not in fact
> run. As a result,
> the error message "NOHZ: local_softirq_pending 08"
> sometimes occurs during boot of a KVM guest when the network service is
> started and we are oom:
>
> ...
> Bringing up loopback interface: [ OK ]
> Bringing up interface eth0:
> Determining IP information for eth0...NOHZ: local_softirq_pending 08
> done.
> [ OK ]
> ...
>
> Further, receive queue processing might get delayed
> indefinitely until some interrupt triggers:
> virtio_net expected napi to be run immediately.
>
> One way to cause do_softirq to be executed is by
> invoking local_bh_enable(). As __napi_schedule is
> normally called from bh or irq context, this
> seems to make sense: disable bh before __napi_schedule
> and enable afterwards.
>
> Reported-by: Ulrich Obergfell <uobergfe@redhat.com>
> Tested-by: Ulrich Obergfell <uobergfe@redhat.com>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>
> To test, one can hack try_fill_recv to always report oom.
> I'm not sure it's not too late for 3.4, but we can try.
> Rusty, could you review ASAP pls?
It's missing a big comment: it's a very complicated way of calling
do_softirq().
Indeed, this function is only used when we are not in interrupt
context. It's not hot at all, in any ideal scenario.
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
^ permalink raw reply
* Re: [PATCH] virtio_net: invoke softirqs after __napi_schedule
From: David Miller @ 2012-05-17 3:40 UTC (permalink / raw)
To: rusty; +Cc: netdev, virtualization, linux-kernel, mst
In-Reply-To: <87vcjvzdlm.fsf@rustcorp.com.au>
From: Rusty Russell <rusty@rustcorp.com.au>
Date: Thu, 17 May 2012 13:02:53 +0930
> On Wed, 16 May 2012 10:57:13 +0300, "Michael S. Tsirkin" <mst@redhat.com> wrote:
>> __napi_schedule might raise softirq but nothing
>> causes do_softirq to trigger, so it does not in fact
>> run. As a result,
>> the error message "NOHZ: local_softirq_pending 08"
>> sometimes occurs during boot of a KVM guest when the network service is
>> started and we are oom:
>>
>> ...
>> Bringing up loopback interface: [ OK ]
>> Bringing up interface eth0:
>> Determining IP information for eth0...NOHZ: local_softirq_pending 08
>> done.
>> [ OK ]
>> ...
>>
>> Further, receive queue processing might get delayed
>> indefinitely until some interrupt triggers:
>> virtio_net expected napi to be run immediately.
>>
>> One way to cause do_softirq to be executed is by
>> invoking local_bh_enable(). As __napi_schedule is
>> normally called from bh or irq context, this
>> seems to make sense: disable bh before __napi_schedule
>> and enable afterwards.
>>
>> Reported-by: Ulrich Obergfell <uobergfe@redhat.com>
>> Tested-by: Ulrich Obergfell <uobergfe@redhat.com>
>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
...
> Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Michael, you're best to submit this directly to Linus as I just
made what I hope is my last push to him for 3.4 today.
^ permalink raw reply
* [PATCH net-next] pktgen: Use pr_debug
From: Joe Perches @ 2012-05-17 3:50 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev, linux-kernel
Convert printk(KERN_DEBUG to pr_debug which can
enable dynamic debugging.
Remove embedded prefixes from the conversions as
pr_fmt adds them.
Align arguments.
Signed-off-by: Joe Perches <joe@perches.com>
---
net/core/pktgen.c | 41 ++++++++++++++++++-----------------------
1 files changed, 18 insertions(+), 23 deletions(-)
diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index 3391257..d22509b 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -891,8 +891,8 @@ static ssize_t pktgen_if_write(struct file *file,
if (copy_from_user(tb, user_buffer, copy))
return -EFAULT;
tb[copy] = 0;
- printk(KERN_DEBUG "pktgen: %s,%lu buffer -:%s:-\n", name,
- (unsigned long)count, tb);
+ pr_debug("%s,%lu buffer -:%s:-\n",
+ name, (unsigned long)count, tb);
}
if (!strcmp(name, "min_pkt_size")) {
@@ -1261,8 +1261,7 @@ static ssize_t pktgen_if_write(struct file *file,
pkt_dev->cur_daddr = pkt_dev->daddr_min;
}
if (debug)
- printk(KERN_DEBUG "pktgen: dst_min set to: %s\n",
- pkt_dev->dst_min);
+ pr_debug("dst_min set to: %s\n", pkt_dev->dst_min);
i += len;
sprintf(pg_result, "OK: dst_min=%s", pkt_dev->dst_min);
return count;
@@ -1284,8 +1283,7 @@ static ssize_t pktgen_if_write(struct file *file,
pkt_dev->cur_daddr = pkt_dev->daddr_max;
}
if (debug)
- printk(KERN_DEBUG "pktgen: dst_max set to: %s\n",
- pkt_dev->dst_max);
+ pr_debug("dst_max set to: %s\n", pkt_dev->dst_max);
i += len;
sprintf(pg_result, "OK: dst_max=%s", pkt_dev->dst_max);
return count;
@@ -1307,7 +1305,7 @@ static ssize_t pktgen_if_write(struct file *file,
pkt_dev->cur_in6_daddr = pkt_dev->in6_daddr;
if (debug)
- printk(KERN_DEBUG "pktgen: dst6 set to: %s\n", buf);
+ pr_debug("dst6 set to: %s\n", buf);
i += len;
sprintf(pg_result, "OK: dst6=%s", buf);
@@ -1329,7 +1327,7 @@ static ssize_t pktgen_if_write(struct file *file,
pkt_dev->cur_in6_daddr = pkt_dev->min_in6_daddr;
if (debug)
- printk(KERN_DEBUG "pktgen: dst6_min set to: %s\n", buf);
+ pr_debug("dst6_min set to: %s\n", buf);
i += len;
sprintf(pg_result, "OK: dst6_min=%s", buf);
@@ -1350,7 +1348,7 @@ static ssize_t pktgen_if_write(struct file *file,
snprintf(buf, sizeof(buf), "%pI6c", &pkt_dev->max_in6_daddr);
if (debug)
- printk(KERN_DEBUG "pktgen: dst6_max set to: %s\n", buf);
+ pr_debug("dst6_max set to: %s\n", buf);
i += len;
sprintf(pg_result, "OK: dst6_max=%s", buf);
@@ -1373,7 +1371,7 @@ static ssize_t pktgen_if_write(struct file *file,
pkt_dev->cur_in6_saddr = pkt_dev->in6_saddr;
if (debug)
- printk(KERN_DEBUG "pktgen: src6 set to: %s\n", buf);
+ pr_debug("src6 set to: %s\n", buf);
i += len;
sprintf(pg_result, "OK: src6=%s", buf);
@@ -1394,8 +1392,7 @@ static ssize_t pktgen_if_write(struct file *file,
pkt_dev->cur_saddr = pkt_dev->saddr_min;
}
if (debug)
- printk(KERN_DEBUG "pktgen: src_min set to: %s\n",
- pkt_dev->src_min);
+ pr_debug("src_min set to: %s\n", pkt_dev->src_min);
i += len;
sprintf(pg_result, "OK: src_min=%s", pkt_dev->src_min);
return count;
@@ -1415,8 +1412,7 @@ static ssize_t pktgen_if_write(struct file *file,
pkt_dev->cur_saddr = pkt_dev->saddr_max;
}
if (debug)
- printk(KERN_DEBUG "pktgen: src_max set to: %s\n",
- pkt_dev->src_max);
+ pr_debug("src_max set to: %s\n", pkt_dev->src_max);
i += len;
sprintf(pg_result, "OK: src_max=%s", pkt_dev->src_max);
return count;
@@ -1527,7 +1523,7 @@ static ssize_t pktgen_if_write(struct file *file,
pkt_dev->svlan_id = 0xffff;
if (debug)
- printk(KERN_DEBUG "pktgen: VLAN/SVLAN auto turned off\n");
+ pr_debug("VLAN/SVLAN auto turned off\n");
}
return count;
}
@@ -1542,10 +1538,10 @@ static ssize_t pktgen_if_write(struct file *file,
pkt_dev->vlan_id = value; /* turn on VLAN */
if (debug)
- printk(KERN_DEBUG "pktgen: VLAN turned on\n");
+ pr_debug("VLAN turned on\n");
if (debug && pkt_dev->nr_labels)
- printk(KERN_DEBUG "pktgen: MPLS auto turned off\n");
+ pr_debug("MPLS auto turned off\n");
pkt_dev->nr_labels = 0; /* turn off MPLS */
sprintf(pg_result, "OK: vlan_id=%u", pkt_dev->vlan_id);
@@ -1554,7 +1550,7 @@ static ssize_t pktgen_if_write(struct file *file,
pkt_dev->svlan_id = 0xffff;
if (debug)
- printk(KERN_DEBUG "pktgen: VLAN/SVLAN turned off\n");
+ pr_debug("VLAN/SVLAN turned off\n");
}
return count;
}
@@ -1599,10 +1595,10 @@ static ssize_t pktgen_if_write(struct file *file,
pkt_dev->svlan_id = value; /* turn on SVLAN */
if (debug)
- printk(KERN_DEBUG "pktgen: SVLAN turned on\n");
+ pr_debug("SVLAN turned on\n");
if (debug && pkt_dev->nr_labels)
- printk(KERN_DEBUG "pktgen: MPLS auto turned off\n");
+ pr_debug("MPLS auto turned off\n");
pkt_dev->nr_labels = 0; /* turn off MPLS */
sprintf(pg_result, "OK: svlan_id=%u", pkt_dev->svlan_id);
@@ -1611,7 +1607,7 @@ static ssize_t pktgen_if_write(struct file *file,
pkt_dev->svlan_id = 0xffff;
if (debug)
- printk(KERN_DEBUG "pktgen: VLAN/SVLAN turned off\n");
+ pr_debug("VLAN/SVLAN turned off\n");
}
return count;
}
@@ -1779,8 +1775,7 @@ static ssize_t pktgen_thread_write(struct file *file,
i += len;
if (debug)
- printk(KERN_DEBUG "pktgen: t=%s, count=%lu\n",
- name, (unsigned long)count);
+ pr_debug("t=%s, count=%lu\n", name, (unsigned long)count);
if (!t) {
pr_err("ERROR: No thread\n");
^ permalink raw reply related
* Re: [RFC 13/13] USB: Disable hub-initiated LPM for comms devices.
From: Sarah Sharp @ 2012-05-17 4:52 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: gigaset307x-common-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
libertas-dev-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
netdev-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA,
linux-wireless-u79uwXL29TY76Z2rM5mHXA,
users-poMEt7QlJxcwIE2E9O76wjtx2kNaKg5H,
linux-bluetooth-u79uwXL29TY76Z2rM5mHXA,
ath9k-devel-xDcbHBWguxHbcTqmT+pZeQ, Alan Stern
In-Reply-To: <20120516232019.GA960-U8xfFu+wG4EAvxtiuMwx3w@public.gmane.org>
On Wed, May 16, 2012 at 04:20:19PM -0700, Greg Kroah-Hartman wrote:
> On Wed, May 16, 2012 at 03:45:28PM -0700, Sarah Sharp wrote:
> > [Resending with a smaller Cc list]
> >
> > Hub-initiated LPM is not good for USB communications devices. Comms
> > devices should be able to tell when their link can go into a lower power
> > state, because they know when an incoming transmission is finished.
> > Ideally, these devices would slam their links into a lower power state,
> > using the device-initiated LPM, after finishing the last packet of their
> > data transfer.
> >
> > If we enable the idle timeouts for the parent hubs to enable
> > hub-initiated LPM, we will get a lot of useless LPM packets on the bus
> > as the devices reject LPM transitions when they're in the middle of
> > receiving data. Worse, some devices might blindly accept the
> > hub-initiated LPM and power down their radios while they're in the
> > middle of receiving a transmission.
> >
> > The Intel Windows folks are disabling hub-initiated LPM for all USB
> > communications devices under a xHCI USB 3.0 host. In order to keep
> > the Linux behavior as close as possible to Windows, we need to do the
> > same in Linux.
>
> How is the USB core on Windows determining that LPM should be turned off
> for these devices? Surely they aren't modifying each individual driver
> like this is, right? Any way we also can do this in the core?
No, I don't think they're modifying individual drivers. Maybe they
placed a shim/filter driver below other drivers?
Basically, I don't know the exact details of what the Windows folks are
doing. The recommendation from the Intel Windows team was simply to
turn hub-initiated LPM off for "all communications devices". Perhaps
the Windows USB core is looking for specific USB class codes? Or maybe
it has some older API that lets the core know it's a communications
device?
I'm not really sure we can do it in the USB core with out basically
duplicating all the class/PID/VID matching in the communications driver.
I think just adding a flag might be the best way. I'm open to
suggestions though.
> Or, turn it around the other way, and only enable it if we know it's
> safe to do so, in each driver, but I guess that would be even messier.
Yeah, I think it would be messier.
Sarah Sharp
^ permalink raw reply
* [PATCH net-next] net: ipv6: ndisc: Neaten ND_PRINTx macros
From: Joe Perches @ 2012-05-17 5:28 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev, linux-kernel
Why use several macros when one will do?
Convert the multiple ND_PRINTKx macros to a single
ND_PRINTK macro. Use the new net_<level>_ratelimited
mechanism too.
Add pr_fmt with "ICMPv6: " as prefix.
Remove embedded ICMPv6 prefixes from messages.
Signed-off-by: Joe Perches <joe@perches.com>
---
net/ipv6/ndisc.c | 216 +++++++++++++++++++++++-------------------------------
1 files changed, 91 insertions(+), 125 deletions(-)
diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index cbb863d..c7a27ac 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -27,27 +27,7 @@
* YOSHIFUJI Hideaki @USAGI : Verify ND options properly
*/
-/* Set to 3 to get tracing... */
-#define ND_DEBUG 1
-
-#define ND_PRINTK(fmt, args...) do { if (net_ratelimit()) { printk(fmt, ## args); } } while(0)
-#define ND_NOPRINTK(x...) do { ; } while(0)
-#define ND_PRINTK0 ND_PRINTK
-#define ND_PRINTK1 ND_NOPRINTK
-#define ND_PRINTK2 ND_NOPRINTK
-#define ND_PRINTK3 ND_NOPRINTK
-#if ND_DEBUG >= 1
-#undef ND_PRINTK1
-#define ND_PRINTK1 ND_PRINTK
-#endif
-#if ND_DEBUG >= 2
-#undef ND_PRINTK2
-#define ND_PRINTK2 ND_PRINTK
-#endif
-#if ND_DEBUG >= 3
-#undef ND_PRINTK3
-#define ND_PRINTK3 ND_PRINTK
-#endif
+#define pr_fmt(fmt) "ICMPv6: " fmt
#include <linux/module.h>
#include <linux/errno.h>
@@ -92,6 +72,15 @@
#include <linux/netfilter.h>
#include <linux/netfilter_ipv6.h>
+/* Set to 3 to get tracing... */
+#define ND_DEBUG 1
+
+#define ND_PRINTK(val, level, fmt, ...) \
+do { \
+ if (val <= ND_DEBUG) \
+ net_##level##_ratelimited(fmt, ##__VA_ARGS__); \
+} while (0)
+
static u32 ndisc_hash(const void *pkey,
const struct net_device *dev,
__u32 *hash_rnd);
@@ -265,10 +254,9 @@ static struct ndisc_options *ndisc_parse_options(u8 *opt, int opt_len,
case ND_OPT_MTU:
case ND_OPT_REDIRECT_HDR:
if (ndopts->nd_opt_array[nd_opt->nd_opt_type]) {
- ND_PRINTK2(KERN_WARNING
- "%s: duplicated ND6 option found: type=%d\n",
- __func__,
- nd_opt->nd_opt_type);
+ ND_PRINTK(2, warn,
+ "%s: duplicated ND6 option found: type=%d\n",
+ __func__, nd_opt->nd_opt_type);
} else {
ndopts->nd_opt_array[nd_opt->nd_opt_type] = nd_opt;
}
@@ -296,10 +284,11 @@ static struct ndisc_options *ndisc_parse_options(u8 *opt, int opt_len,
* to accommodate future extension to the
* protocol.
*/
- ND_PRINTK2(KERN_NOTICE
- "%s: ignored unsupported option; type=%d, len=%d\n",
- __func__,
- nd_opt->nd_opt_type, nd_opt->nd_opt_len);
+ ND_PRINTK(2, notice,
+ "%s: ignored unsupported option; type=%d, len=%d\n",
+ __func__,
+ nd_opt->nd_opt_type,
+ nd_opt->nd_opt_len);
}
}
opt_len -= l;
@@ -455,9 +444,8 @@ struct sk_buff *ndisc_build_skb(struct net_device *dev,
len + hlen + tlen),
1, &err);
if (!skb) {
- ND_PRINTK0(KERN_ERR
- "ICMPv6 ND: %s failed to allocate an skb, err=%d.\n",
- __func__, err);
+ ND_PRINTK(0, err, "ND: %s failed to allocate an skb, err=%d\n",
+ __func__, err);
return NULL;
}
@@ -693,8 +681,9 @@ static void ndisc_solicit(struct neighbour *neigh, struct sk_buff *skb)
if ((probes -= neigh->parms->ucast_probes) < 0) {
if (!(neigh->nud_state & NUD_VALID)) {
- ND_PRINTK1(KERN_DEBUG "%s: trying to ucast probe in NUD_INVALID: %pI6\n",
- __func__, target);
+ ND_PRINTK(1, dbg,
+ "%s: trying to ucast probe in NUD_INVALID: %pI6\n",
+ __func__, target);
}
ndisc_send_ns(dev, neigh, target, target, saddr);
} else if ((probes -= neigh->parms->app_probes) < 0) {
@@ -740,8 +729,7 @@ static void ndisc_recv_ns(struct sk_buff *skb)
int is_router = -1;
if (ipv6_addr_is_multicast(&msg->target)) {
- ND_PRINTK2(KERN_WARNING
- "ICMPv6 NS: multicast target address");
+ ND_PRINTK(2, warn, "NS: multicast target address\n");
return;
}
@@ -754,22 +742,20 @@ static void ndisc_recv_ns(struct sk_buff *skb)
daddr->s6_addr32[1] == htonl(0x00000000) &&
daddr->s6_addr32[2] == htonl(0x00000001) &&
daddr->s6_addr [12] == 0xff )) {
- ND_PRINTK2(KERN_WARNING
- "ICMPv6 NS: bad DAD packet (wrong destination)\n");
+ ND_PRINTK(2, warn, "NS: bad DAD packet (wrong destination)\n");
return;
}
if (!ndisc_parse_options(msg->opt, ndoptlen, &ndopts)) {
- ND_PRINTK2(KERN_WARNING
- "ICMPv6 NS: invalid ND options\n");
+ ND_PRINTK(2, warn, "NS: invalid ND options\n");
return;
}
if (ndopts.nd_opts_src_lladdr) {
lladdr = ndisc_opt_addr_data(ndopts.nd_opts_src_lladdr, dev);
if (!lladdr) {
- ND_PRINTK2(KERN_WARNING
- "ICMPv6 NS: invalid link-layer address length\n");
+ ND_PRINTK(2, warn,
+ "NS: invalid link-layer address length\n");
return;
}
@@ -779,8 +765,8 @@ static void ndisc_recv_ns(struct sk_buff *skb)
* in the message.
*/
if (dad) {
- ND_PRINTK2(KERN_WARNING
- "ICMPv6 NS: bad DAD packet (link-layer address option)\n");
+ ND_PRINTK(2, warn,
+ "NS: bad DAD packet (link-layer address option)\n");
return;
}
}
@@ -898,34 +884,30 @@ static void ndisc_recv_na(struct sk_buff *skb)
struct neighbour *neigh;
if (skb->len < sizeof(struct nd_msg)) {
- ND_PRINTK2(KERN_WARNING
- "ICMPv6 NA: packet too short\n");
+ ND_PRINTK(2, warn, "NA: packet too short\n");
return;
}
if (ipv6_addr_is_multicast(&msg->target)) {
- ND_PRINTK2(KERN_WARNING
- "ICMPv6 NA: target address is multicast.\n");
+ ND_PRINTK(2, warn, "NA: target address is multicast\n");
return;
}
if (ipv6_addr_is_multicast(daddr) &&
msg->icmph.icmp6_solicited) {
- ND_PRINTK2(KERN_WARNING
- "ICMPv6 NA: solicited NA is multicasted.\n");
+ ND_PRINTK(2, warn, "NA: solicited NA is multicasted\n");
return;
}
if (!ndisc_parse_options(msg->opt, ndoptlen, &ndopts)) {
- ND_PRINTK2(KERN_WARNING
- "ICMPv6 NS: invalid ND option\n");
+ ND_PRINTK(2, warn, "NS: invalid ND option\n");
return;
}
if (ndopts.nd_opts_tgt_lladdr) {
lladdr = ndisc_opt_addr_data(ndopts.nd_opts_tgt_lladdr, dev);
if (!lladdr) {
- ND_PRINTK2(KERN_WARNING
- "ICMPv6 NA: invalid link-layer address length\n");
+ ND_PRINTK(2, warn,
+ "NA: invalid link-layer address length\n");
return;
}
}
@@ -946,9 +928,9 @@ static void ndisc_recv_na(struct sk_buff *skb)
unsolicited advertisement.
*/
if (skb->pkt_type != PACKET_LOOPBACK)
- ND_PRINTK1(KERN_WARNING
- "ICMPv6 NA: someone advertises our address %pI6 on %s!\n",
- &ifp->addr, ifp->idev->dev->name);
+ ND_PRINTK(1, warn,
+ "NA: someone advertises our address %pI6 on %s!\n",
+ &ifp->addr, ifp->idev->dev->name);
in6_ifa_put(ifp);
return;
}
@@ -1010,8 +992,7 @@ static void ndisc_recv_rs(struct sk_buff *skb)
idev = __in6_dev_get(skb->dev);
if (!idev) {
- if (net_ratelimit())
- ND_PRINTK1("ICMP6 RS: can't find in6 device\n");
+ ND_PRINTK(1, err, "RS: can't find in6 device\n");
return;
}
@@ -1028,8 +1009,7 @@ static void ndisc_recv_rs(struct sk_buff *skb)
/* Parse ND options */
if (!ndisc_parse_options(rs_msg->opt, ndoptlen, &ndopts)) {
- if (net_ratelimit())
- ND_PRINTK2("ICMP6 NS: invalid ND option, ignored\n");
+ ND_PRINTK(2, notice, "NS: invalid ND option, ignored\n");
goto out;
}
@@ -1127,20 +1107,17 @@ static void ndisc_router_discovery(struct sk_buff *skb)
optlen = (skb->tail - skb->transport_header) - sizeof(struct ra_msg);
if (!(ipv6_addr_type(&ipv6_hdr(skb)->saddr) & IPV6_ADDR_LINKLOCAL)) {
- ND_PRINTK2(KERN_WARNING
- "ICMPv6 RA: source address is not link-local.\n");
+ ND_PRINTK(2, warn, "RA: source address is not link-local\n");
return;
}
if (optlen < 0) {
- ND_PRINTK2(KERN_WARNING
- "ICMPv6 RA: packet too short\n");
+ ND_PRINTK(2, warn, "RA: packet too short\n");
return;
}
#ifdef CONFIG_IPV6_NDISC_NODETYPE
if (skb->ndisc_nodetype == NDISC_NODETYPE_HOST) {
- ND_PRINTK2(KERN_WARNING
- "ICMPv6 RA: from host or unauthorized router\n");
+ ND_PRINTK(2, warn, "RA: from host or unauthorized router\n");
return;
}
#endif
@@ -1151,15 +1128,13 @@ static void ndisc_router_discovery(struct sk_buff *skb)
in6_dev = __in6_dev_get(skb->dev);
if (in6_dev == NULL) {
- ND_PRINTK0(KERN_ERR
- "ICMPv6 RA: can't find inet6 device for %s.\n",
- skb->dev->name);
+ ND_PRINTK(0, err, "RA: can't find inet6 device for %s\n",
+ skb->dev->name);
return;
}
if (!ndisc_parse_options(opt, optlen, &ndopts)) {
- ND_PRINTK2(KERN_WARNING
- "ICMP6 RA: invalid ND options\n");
+ ND_PRINTK(2, warn, "RA: invalid ND options\n");
return;
}
@@ -1212,9 +1187,9 @@ static void ndisc_router_discovery(struct sk_buff *skb)
if (rt) {
neigh = dst_neigh_lookup(&rt->dst, &ipv6_hdr(skb)->saddr);
if (!neigh) {
- ND_PRINTK0(KERN_ERR
- "ICMPv6 RA: %s got default router without neighbour.\n",
- __func__);
+ ND_PRINTK(0, err,
+ "RA: %s got default router without neighbour\n",
+ __func__);
dst_release(&rt->dst);
return;
}
@@ -1225,22 +1200,21 @@ static void ndisc_router_discovery(struct sk_buff *skb)
}
if (rt == NULL && lifetime) {
- ND_PRINTK3(KERN_DEBUG
- "ICMPv6 RA: adding default router.\n");
+ ND_PRINTK(3, dbg, "RA: adding default router\n");
rt = rt6_add_dflt_router(&ipv6_hdr(skb)->saddr, skb->dev, pref);
if (rt == NULL) {
- ND_PRINTK0(KERN_ERR
- "ICMPv6 RA: %s failed to add default route.\n",
- __func__);
+ ND_PRINTK(0, err,
+ "RA: %s failed to add default route\n",
+ __func__);
return;
}
neigh = dst_neigh_lookup(&rt->dst, &ipv6_hdr(skb)->saddr);
if (neigh == NULL) {
- ND_PRINTK0(KERN_ERR
- "ICMPv6 RA: %s got default router without neighbour.\n",
- __func__);
+ ND_PRINTK(0, err,
+ "RA: %s got default router without neighbour\n",
+ __func__);
dst_release(&rt->dst);
return;
}
@@ -1308,8 +1282,8 @@ skip_linkparms:
lladdr = ndisc_opt_addr_data(ndopts.nd_opts_src_lladdr,
skb->dev);
if (!lladdr) {
- ND_PRINTK2(KERN_WARNING
- "ICMPv6 RA: invalid link-layer address length\n");
+ ND_PRINTK(2, warn,
+ "RA: invalid link-layer address length\n");
goto out;
}
}
@@ -1373,9 +1347,7 @@ skip_routeinfo:
mtu = ntohl(n);
if (mtu < IPV6_MIN_MTU || mtu > skb->dev->mtu) {
- ND_PRINTK2(KERN_WARNING
- "ICMPv6 RA: invalid mtu: %d\n",
- mtu);
+ ND_PRINTK(2, warn, "RA: invalid mtu: %d\n", mtu);
} else if (in6_dev->cnf.mtu6 != mtu) {
in6_dev->cnf.mtu6 = mtu;
@@ -1396,8 +1368,7 @@ skip_routeinfo:
}
if (ndopts.nd_opts_tgt_lladdr || ndopts.nd_opts_rh) {
- ND_PRINTK2(KERN_WARNING
- "ICMPv6 RA: invalid RA options");
+ ND_PRINTK(2, warn, "RA: invalid RA options\n");
}
out:
if (rt)
@@ -1422,15 +1393,15 @@ static void ndisc_redirect_rcv(struct sk_buff *skb)
switch (skb->ndisc_nodetype) {
case NDISC_NODETYPE_HOST:
case NDISC_NODETYPE_NODEFAULT:
- ND_PRINTK2(KERN_WARNING
- "ICMPv6 Redirect: from host or unauthorized router\n");
+ ND_PRINTK(2, warn,
+ "Redirect: from host or unauthorized router\n");
return;
}
#endif
if (!(ipv6_addr_type(&ipv6_hdr(skb)->saddr) & IPV6_ADDR_LINKLOCAL)) {
- ND_PRINTK2(KERN_WARNING
- "ICMPv6 Redirect: source address is not link-local.\n");
+ ND_PRINTK(2, warn,
+ "Redirect: source address is not link-local\n");
return;
}
@@ -1438,8 +1409,7 @@ static void ndisc_redirect_rcv(struct sk_buff *skb)
optlen -= sizeof(struct icmp6hdr) + 2 * sizeof(struct in6_addr);
if (optlen < 0) {
- ND_PRINTK2(KERN_WARNING
- "ICMPv6 Redirect: packet too short\n");
+ ND_PRINTK(2, warn, "Redirect: packet too short\n");
return;
}
@@ -1448,8 +1418,8 @@ static void ndisc_redirect_rcv(struct sk_buff *skb)
dest = target + 1;
if (ipv6_addr_is_multicast(dest)) {
- ND_PRINTK2(KERN_WARNING
- "ICMPv6 Redirect: destination address is multicast.\n");
+ ND_PRINTK(2, warn,
+ "Redirect: destination address is multicast\n");
return;
}
@@ -1457,8 +1427,8 @@ static void ndisc_redirect_rcv(struct sk_buff *skb)
on_link = 1;
} else if (ipv6_addr_type(target) !=
(IPV6_ADDR_UNICAST|IPV6_ADDR_LINKLOCAL)) {
- ND_PRINTK2(KERN_WARNING
- "ICMPv6 Redirect: target address is not link-local unicast.\n");
+ ND_PRINTK(2, warn,
+ "Redirect: target address is not link-local unicast\n");
return;
}
@@ -1474,16 +1444,15 @@ static void ndisc_redirect_rcv(struct sk_buff *skb)
*/
if (!ndisc_parse_options((u8*)(dest + 1), optlen, &ndopts)) {
- ND_PRINTK2(KERN_WARNING
- "ICMPv6 Redirect: invalid ND options\n");
+ ND_PRINTK(2, warn, "Redirect: invalid ND options\n");
return;
}
if (ndopts.nd_opts_tgt_lladdr) {
lladdr = ndisc_opt_addr_data(ndopts.nd_opts_tgt_lladdr,
skb->dev);
if (!lladdr) {
- ND_PRINTK2(KERN_WARNING
- "ICMPv6 Redirect: invalid link-layer address length\n");
+ ND_PRINTK(2, warn,
+ "Redirect: invalid link-layer address length\n");
return;
}
}
@@ -1518,16 +1487,15 @@ void ndisc_send_redirect(struct sk_buff *skb, const struct in6_addr *target)
u8 ha_buf[MAX_ADDR_LEN], *ha = NULL;
if (ipv6_get_lladdr(dev, &saddr_buf, IFA_F_TENTATIVE)) {
- ND_PRINTK2(KERN_WARNING
- "ICMPv6 Redirect: no link-local address on %s\n",
- dev->name);
+ ND_PRINTK(2, warn, "Redirect: no link-local address on %s\n",
+ dev->name);
return;
}
if (!ipv6_addr_equal(&ipv6_hdr(skb)->daddr, target) &&
ipv6_addr_type(target) != (IPV6_ADDR_UNICAST|IPV6_ADDR_LINKLOCAL)) {
- ND_PRINTK2(KERN_WARNING
- "ICMPv6 Redirect: target address is not link-local unicast.\n");
+ ND_PRINTK(2, warn,
+ "Redirect: target address is not link-local unicast\n");
return;
}
@@ -1546,8 +1514,8 @@ void ndisc_send_redirect(struct sk_buff *skb, const struct in6_addr *target)
rt = (struct rt6_info *) dst;
if (rt->rt6i_flags & RTF_GATEWAY) {
- ND_PRINTK2(KERN_WARNING
- "ICMPv6 Redirect: destination is not a neighbour.\n");
+ ND_PRINTK(2, warn,
+ "Redirect: destination is not a neighbour\n");
goto release;
}
if (!rt->rt6i_peer)
@@ -1558,8 +1526,8 @@ void ndisc_send_redirect(struct sk_buff *skb, const struct in6_addr *target)
if (dev->addr_len) {
struct neighbour *neigh = dst_neigh_lookup(skb_dst(skb), target);
if (!neigh) {
- ND_PRINTK2(KERN_WARNING
- "ICMPv6 Redirect: no neigh for target address\n");
+ ND_PRINTK(2, warn,
+ "Redirect: no neigh for target address\n");
goto release;
}
@@ -1587,9 +1555,9 @@ void ndisc_send_redirect(struct sk_buff *skb, const struct in6_addr *target)
len + hlen + tlen),
1, &err);
if (buff == NULL) {
- ND_PRINTK0(KERN_ERR
- "ICMPv6 Redirect: %s failed to allocate an skb, err=%d.\n",
- __func__, err);
+ ND_PRINTK(0, err,
+ "Redirect: %s failed to allocate an skb, err=%d\n",
+ __func__, err);
goto release;
}
@@ -1674,16 +1642,14 @@ int ndisc_rcv(struct sk_buff *skb)
__skb_push(skb, skb->data - skb_transport_header(skb));
if (ipv6_hdr(skb)->hop_limit != 255) {
- ND_PRINTK2(KERN_WARNING
- "ICMPv6 NDISC: invalid hop-limit: %d\n",
- ipv6_hdr(skb)->hop_limit);
+ ND_PRINTK(2, warn, "NDISC: invalid hop-limit: %d\n",
+ ipv6_hdr(skb)->hop_limit);
return 0;
}
if (msg->icmph.icmp6_code != 0) {
- ND_PRINTK2(KERN_WARNING
- "ICMPv6 NDISC: invalid ICMPv6 code: %d\n",
- msg->icmph.icmp6_code);
+ ND_PRINTK(2, warn, "NDISC: invalid ICMPv6 code: %d\n",
+ msg->icmph.icmp6_code);
return 0;
}
@@ -1804,9 +1770,9 @@ static int __net_init ndisc_net_init(struct net *net)
err = inet_ctl_sock_create(&sk, PF_INET6,
SOCK_RAW, IPPROTO_ICMPV6, net);
if (err < 0) {
- ND_PRINTK0(KERN_ERR
- "ICMPv6 NDISC: Failed to initialize the control socket (err %d).\n",
- err);
+ ND_PRINTK(0, err,
+ "NDISC: Failed to initialize the control socket (err %d)\n",
+ err);
return err;
}
^ permalink raw reply related
* Re: [PATCH v5 2/2] decrement static keys on real destroy time
From: Andrew Morton @ 2012-05-17 5:37 UTC (permalink / raw)
To: Glauber Costa
Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
devel-GEFAQzZX7r8dnm+yROfE0A,
kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
netdev-u79uwXL29TY76Z2rM5mHXA, Tejun Heo, Li Zefan,
Johannes Weiner, Michal Hocko
In-Reply-To: <4FB46B4C.3000307-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
On Thu, 17 May 2012 07:06:52 +0400 Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> wrote:
> ...
> >> + else if (val != RESOURCE_MAX) {
> >> + /*
> >> + * ->activated needs to be written after the static_key update.
> >> + * This is what guarantees that the socket activation function
> >> + * is the last one to run. See sock_update_memcg() for details,
> >> + * and note that we don't mark any socket as belonging to this
> >> + * memcg until that flag is up.
> >> + *
> >> + * We need to do this, because static_keys will span multiple
> >> + * sites, but we can't control their order. If we mark a socket
> >> + * as accounted, but the accounting functions are not patched in
> >> + * yet, we'll lose accounting.
> >> + *
> >> + * We never race with the readers in sock_update_memcg(), because
> >> + * when this value change, the code to process it is not patched in
> >> + * yet.
> >> + */
> >> + if (!cg_proto->activated) {
> >> + static_key_slow_inc(&memcg_socket_limit_enabled);
> >> + cg_proto->activated = true;
> >> + }
> >
> > If two threads run this code concurrently, they can both see
> > cg_proto->activated==false and they will both run
> > static_key_slow_inc().
> >
> > Hopefully there's some locking somewhere which prevents this, but it is
> > unobvious. We should comment this, probably at the cg_proto.activated
> > definition site. Or we should fix the bug ;)
> >
> If that happens, locking in static_key_slow_inc will prevent any damage.
> My previous version had explicit code to prevent that, but we were
> pointed out that this is already part of the static_key expectations, so
> that was dropped.
This makes no sense. If two threads run that code concurrently,
key->enabled gets incremented twice. Nobody anywhere has a record that
this happened so it cannot be undone. key->enabled is now in an
unknown state.
^ permalink raw reply
* Re: linux-next: manual merge of the net-next tree with the sparc-next tree
From: Sam Ravnborg @ 2012-05-17 5:54 UTC (permalink / raw)
To: Stephen Rothwell; +Cc: David Miller, netdev, linux-next, linux-kernel
In-Reply-To: <20120517130406.1706f944820bbe8dc5e368f2@canb.auug.org.au>
> So I applied this merge fixup to the merge of the net-next tree today
> (and can carry it as necessary):
Perfect. And please carry this as we do not plan to solve this
conflict.
Sam
^ permalink raw reply
* [PATCH net-next] net: core: Use pr_<level>
From: Joe Perches @ 2012-05-17 5:58 UTC (permalink / raw)
To: David S. Miller; +Cc: Neil Horman, netdev, linux-kernel
Use the current logging style.
This enables use of dynamic debugging as well.
Convert printk(KERN_<LEVEL> to pr_<level>.
Add pr_fmt. Remove embedded prefixes, use
%s, __func__ instead.
Signed-off-by: Joe Perches <joe@perches.com>
---
net/core/drop_monitor.c | 10 ++++++----
net/core/neighbour.c | 13 +++++++------
net/core/net_namespace.c | 6 ++++--
net/core/netprio_cgroup.c | 6 ++++--
net/core/skbuff.c | 20 ++++++++++----------
net/core/sock.c | 25 +++++++++++++------------
6 files changed, 44 insertions(+), 36 deletions(-)
diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c
index a7cad74..eca00a9 100644
--- a/net/core/drop_monitor.c
+++ b/net/core/drop_monitor.c
@@ -4,6 +4,8 @@
* Copyright (C) 2009 Neil Horman <nhorman@tuxdriver.com>
*/
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
#include <linux/netdevice.h>
#include <linux/etherdevice.h>
#include <linux/string.h>
@@ -381,10 +383,10 @@ static int __init init_net_drop_monitor(void)
struct per_cpu_dm_data *data;
int cpu, rc;
- printk(KERN_INFO "Initializing network drop monitor service\n");
+ pr_info("Initializing network drop monitor service\n");
if (sizeof(void *) > 8) {
- printk(KERN_ERR "Unable to store program counters on this arch, Drop monitor failed\n");
+ pr_err("Unable to store program counters on this arch, Drop monitor failed\n");
return -ENOSPC;
}
@@ -392,13 +394,13 @@ static int __init init_net_drop_monitor(void)
dropmon_ops,
ARRAY_SIZE(dropmon_ops));
if (rc) {
- printk(KERN_ERR "Could not create drop monitor netlink family\n");
+ pr_err("Could not create drop monitor netlink family\n");
return rc;
}
rc = register_netdevice_notifier(&dropmon_net_notifier);
if (rc < 0) {
- printk(KERN_CRIT "Failed to register netdevice notifier\n");
+ pr_crit("Failed to register netdevice notifier\n");
goto out_unreg;
}
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index fadaa81..eb09f8b 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -15,6 +15,8 @@
* Harald Welte Add neighbour cache statistics like rtstat
*/
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
#include <linux/slab.h>
#include <linux/types.h>
#include <linux/kernel.h>
@@ -712,14 +714,13 @@ void neigh_destroy(struct neighbour *neigh)
NEIGH_CACHE_STAT_INC(neigh->tbl, destroys);
if (!neigh->dead) {
- printk(KERN_WARNING
- "Destroying alive neighbour %p\n", neigh);
+ pr_warn("Destroying alive neighbour %p\n", neigh);
dump_stack();
return;
}
if (neigh_del_timer(neigh))
- printk(KERN_WARNING "Impossible event.\n");
+ pr_warn("Impossible event\n");
skb_queue_purge(&neigh->arp_queue);
neigh->arp_queue_len_bytes = 0;
@@ -1554,8 +1555,8 @@ void neigh_table_init(struct neigh_table *tbl)
write_unlock(&neigh_tbl_lock);
if (unlikely(tmp)) {
- printk(KERN_ERR "NEIGH: Registering multiple tables for "
- "family %d\n", tbl->family);
+ pr_err("Registering multiple tables for family %d\n",
+ tbl->family);
dump_stack();
}
}
@@ -1571,7 +1572,7 @@ int neigh_table_clear(struct neigh_table *tbl)
pneigh_queue_purge(&tbl->proxy_queue);
neigh_ifdown(tbl, NULL);
if (atomic_read(&tbl->entries))
- printk(KERN_CRIT "neighbour leakage\n");
+ pr_crit("neighbour leakage\n");
write_lock(&neigh_tbl_lock);
for (tp = &neigh_tables; *tp; tp = &(*tp)->next) {
if (*tp == tbl) {
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 31a5ae5..dddbacb 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -1,3 +1,5 @@
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
#include <linux/workqueue.h>
#include <linux/rtnetlink.h>
#include <linux/cache.h>
@@ -212,8 +214,8 @@ static void net_free(struct net *net)
{
#ifdef NETNS_REFCNT_DEBUG
if (unlikely(atomic_read(&net->use_count) != 0)) {
- printk(KERN_EMERG "network namespace not free! Usage: %d\n",
- atomic_read(&net->use_count));
+ pr_emerg("network namespace not free! Usage: %d\n",
+ atomic_read(&net->use_count));
return;
}
#endif
diff --git a/net/core/netprio_cgroup.c b/net/core/netprio_cgroup.c
index ba6900f..09eda68 100644
--- a/net/core/netprio_cgroup.c
+++ b/net/core/netprio_cgroup.c
@@ -9,6 +9,8 @@
* Authors: Neil Horman <nhorman@tuxdriver.com>
*/
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
#include <linux/module.h>
#include <linux/slab.h>
#include <linux/types.h>
@@ -88,7 +90,7 @@ static void extend_netdev_table(struct net_device *dev, u32 new_len)
old_priomap = rtnl_dereference(dev->priomap);
if (!new_priomap) {
- printk(KERN_WARNING "Unable to alloc new priomap!\n");
+ pr_warn("Unable to alloc new priomap!\n");
return;
}
@@ -136,7 +138,7 @@ static struct cgroup_subsys_state *cgrp_create(struct cgroup *cgrp)
ret = get_prioidx(&cs->prioidx);
if (ret != 0) {
- printk(KERN_WARNING "No space in priority index array\n");
+ pr_warn("No space in priority index array\n");
kfree(cs);
return ERR_PTR(ret);
}
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 2a18719..7a10f08 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -36,6 +36,8 @@
* The functions in this file will not compile correctly with gcc 2.4.x
*/
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
#include <linux/module.h>
#include <linux/types.h>
#include <linux/kernel.h>
@@ -118,11 +120,10 @@ static const struct pipe_buf_operations sock_pipe_buf_ops = {
*/
static void skb_over_panic(struct sk_buff *skb, int sz, void *here)
{
- printk(KERN_EMERG "skb_over_panic: text:%p len:%d put:%d head:%p "
- "data:%p tail:%#lx end:%#lx dev:%s\n",
- here, skb->len, sz, skb->head, skb->data,
- (unsigned long)skb->tail, (unsigned long)skb->end,
- skb->dev ? skb->dev->name : "<NULL>");
+ pr_emerg("%s: text:%p len:%d put:%d head:%p data:%p tail:%#lx end:%#lx dev:%s\n",
+ __func__, here, skb->len, sz, skb->head, skb->data,
+ (unsigned long)skb->tail, (unsigned long)skb->end,
+ skb->dev ? skb->dev->name : "<NULL>");
BUG();
}
@@ -137,11 +138,10 @@ static void skb_over_panic(struct sk_buff *skb, int sz, void *here)
static void skb_under_panic(struct sk_buff *skb, int sz, void *here)
{
- printk(KERN_EMERG "skb_under_panic: text:%p len:%d put:%d head:%p "
- "data:%p tail:%#lx end:%#lx dev:%s\n",
- here, skb->len, sz, skb->head, skb->data,
- (unsigned long)skb->tail, (unsigned long)skb->end,
- skb->dev ? skb->dev->name : "<NULL>");
+ pr_emerg("%s: text:%p len:%d put:%d head:%p data:%p tail:%#lx end:%#lx dev:%s\n",
+ __func__, here, skb->len, sz, skb->head, skb->data,
+ (unsigned long)skb->tail, (unsigned long)skb->end,
+ skb->dev ? skb->dev->name : "<NULL>");
BUG();
}
diff --git a/net/core/sock.c b/net/core/sock.c
index 9d144ee..5efcd63 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -89,6 +89,8 @@
* 2 of the License, or (at your option) any later version.
*/
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
#include <linux/capability.h>
#include <linux/errno.h>
#include <linux/types.h>
@@ -297,9 +299,8 @@ static int sock_set_timeout(long *timeo_p, char __user *optval, int optlen)
*timeo_p = 0;
if (warned < 10 && net_ratelimit()) {
warned++;
- printk(KERN_INFO "sock_set_timeout: `%s' (pid %d) "
- "tries to set negative timeout\n",
- current->comm, task_pid_nr(current));
+ pr_info("%s: `%s' (pid %d) tries to set negative timeout\n",
+ __func__, current->comm, task_pid_nr(current));
}
return 0;
}
@@ -317,8 +318,8 @@ static void sock_warn_obsolete_bsdism(const char *name)
static char warncomm[TASK_COMM_LEN];
if (strcmp(warncomm, current->comm) && warned < 5) {
strcpy(warncomm, current->comm);
- printk(KERN_WARNING "process `%s' is using obsolete "
- "%s SO_BSDCOMPAT\n", warncomm, name);
+ pr_warn("process `%s' is using obsolete %s SO_BSDCOMPAT\n",
+ warncomm, name);
warned++;
}
}
@@ -1238,8 +1239,8 @@ static void __sk_free(struct sock *sk)
sock_disable_timestamp(sk, SK_FLAGS_TIMESTAMP);
if (atomic_read(&sk->sk_omem_alloc))
- printk(KERN_DEBUG "%s: optmem leakage (%d bytes) detected.\n",
- __func__, atomic_read(&sk->sk_omem_alloc));
+ pr_debug("%s: optmem leakage (%d bytes) detected\n",
+ __func__, atomic_read(&sk->sk_omem_alloc));
if (sk->sk_peer_cred)
put_cred(sk->sk_peer_cred);
@@ -2424,7 +2425,7 @@ static void assign_proto_idx(struct proto *prot)
prot->inuse_idx = find_first_zero_bit(proto_inuse_idx, PROTO_INUSE_NR);
if (unlikely(prot->inuse_idx == PROTO_INUSE_NR - 1)) {
- printk(KERN_ERR "PROTO_INUSE_NR exhausted\n");
+ pr_err("PROTO_INUSE_NR exhausted\n");
return;
}
@@ -2454,8 +2455,8 @@ int proto_register(struct proto *prot, int alloc_slab)
NULL);
if (prot->slab == NULL) {
- printk(KERN_CRIT "%s: Can't create sock SLAB cache!\n",
- prot->name);
+ pr_crit("%s: Can't create sock SLAB cache!\n",
+ prot->name);
goto out;
}
@@ -2469,8 +2470,8 @@ int proto_register(struct proto *prot, int alloc_slab)
SLAB_HWCACHE_ALIGN, NULL);
if (prot->rsk_prot->slab == NULL) {
- printk(KERN_CRIT "%s: Can't create request sock SLAB cache!\n",
- prot->name);
+ pr_crit("%s: Can't create request sock SLAB cache!\n",
+ prot->name);
goto out_free_request_sock_slab_name;
}
}
^ permalink raw reply related
* Re: [PATCH v3 4/7] ARM: davinci: net: davinci_emac: add OF support
From: Heiko Schocher @ 2012-05-17 6:32 UTC (permalink / raw)
To: Nori, Sekhar
Cc: davinci-linux-open-source@linux.davincidsp.com,
linux-arm-kernel@lists.infradead.org,
devicetree-discuss@lists.ozlabs.org, netdev@vger.kernel.org,
Grant Likely, Wolfgang Denk, Anatoly Sivov
In-Reply-To: <DF0F476B391FA8409C78302C7BA518B63EA14401@DBDE01.ent.ti.com>
Hello Nori,
thanks for the review!
Nori, Sekhar wrote:
> Hi Heiko,
>
> On Mon, Mar 05, 2012 at 16:40:01, Heiko Schocher wrote:
>> add of support for the davinci_emac driver.
>>
>> Signed-off-by: Heiko Schocher <hs@denx.de>
>> Cc: davinci-linux-open-source@linux.davincidsp.com
>> Cc: linux-arm-kernel@lists.infradead.org
>> Cc: devicetree-discuss@lists.ozlabs.org
>> Cc: netdev@vger.kernel.org
>> Cc: Grant Likely <grant.likely@secretlab.ca>
>> Cc: Sekhar Nori <nsekhar@ti.com>
>> Cc: Wolfgang Denk <wd@denx.de>
>> Cc: Anatoly Sivov <mm05@mail.ru>
>>
>> ---
>> - changes for v2:
>> - add comment from Anatoly Sivov
>> - fix typo in davinci_emac.txt
>> - add comment from Grant Likely:
>> - add prefix "ti,davinci-" to davinci specific property names
>> - remove version property
>> - use compatible name "ti,davinci-dm6460-emac"
>> - use devm_kzalloc()
>> - use of_match_ptr()
>> - document all new properties
>> - remove of_address_to_resource() and do not overwrite
>> resource table
>> - whitespace fixes
>> - remove hw_ram_addr as it is not used in current
>> board code
>> - no changes for v3
>>
>> .../bindings/arm/davinci/davinci_emac.txt | 43 +++++++++
>> drivers/net/ethernet/ti/davinci_emac.c | 94 +++++++++++++++++++-
>> 2 files changed, 136 insertions(+), 1 deletions(-)
>> create mode 100644 Documentation/devicetree/bindings/arm/davinci/davinci_emac.txt
>>
>> diff --git a/Documentation/devicetree/bindings/arm/davinci/davinci_emac.txt b/Documentation/devicetree/bindings/arm/davinci/davinci_emac.txt
>> new file mode 100644
>> index 0000000..a7b0911
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/arm/davinci/davinci_emac.txt
>
> Since DaVinci EMAC driver may be used on platforms other than DaVinci (c6x,
> OMAP), can we place the bindings documentation in bindings/net/?
Done.
>> @@ -0,0 +1,43 @@
>> +* Texas Instruments Davinci EMAC
>> +
>> +This file provides information, what the device node
>> +for the davinci_emac interface contain.
>
> s/contain/contains
fixed.
>> +
>> +Required properties:
>> +- compatible: "ti,davinci-dm6460-emac";
>
> There is no device called dm6460. If you intend to only support
> "version 2" of the EMAC IP at this time, you can use dm6467
> (http://www.ti.com/product/tms320dm6467)
renamed to "ti,davinci-dm6467-emac"
>> +- reg: Offset and length of the register set for the device
>> +- ti,davinci-ctrl-reg-offset: offset to control register
>> +- ti,davinci-ctrl-mod-reg-offset: offset to control module register
>> +- ti,davinci-ctrl-ram-offset: offset to control module ram
>> +- ti,davinci-ctrl-ram-size: size of control module ram
>> +- ti,davinci-rmii-en: use RMII
>> +- ti,davinci-no-bd-ram: has the emac controller BD RAM
>> +- phy-handle: Contains a phandle to an Ethernet PHY.
>> + if not, davinci_emac driver defaults to 100/FULL
>> +- interrupts: interrupt mapping for the davinci emac interrupts sources:
>> + 4 sources: <Receive Threshold Interrupt
>> + Receive Interrupt
>> + Transmit Interrupt
>> + Miscellaneous Interrupt>
>> +- pinmux-handle: Contains a handle to configure the pinmux settings.
removed.
>> +
>> +Optional properties:
>> +- local-mac-address : 6 bytes, mac address
>> +
>> +Example (enbw_cmc board):
>> + eth0: emac@1e20000 {
>> + compatible = "ti,davinci-dm6460-emac";
>> + reg = <0x220000 0x4000>;
>> + ti,davinci-ctrl-reg-offset = <0x3000>;
>> + ti,davinci-ctrl-mod-reg-offset = <0x2000>;
>> + ti,davinci-ctrl-ram-offset = <0>;
>> + ti,davinci-ctrl-ram-size = <0x2000>;
>> + local-mac-address = [ 00 00 00 00 00 00 ];
>> + interrupts = <33
>> + 34
>> + 35
>> + 36
>> + >;
>> + interrupt-parent = <&intc>;
>> + pinmux-handle = <&emac_pins>;
removed.
>> + };
>> diff --git a/drivers/net/ethernet/ti/davinci_emac.c b/drivers/net/ethernet/ti/davinci_emac.c
>> index 4fa0bcb..56e1c35 100644
>> --- a/drivers/net/ethernet/ti/davinci_emac.c
>> +++ b/drivers/net/ethernet/ti/davinci_emac.c
>> @@ -58,6 +58,12 @@
>> #include <linux/io.h>
>> #include <linux/uaccess.h>
>> #include <linux/davinci_emac.h>
>> +#include <linux/of.h>
>> +#include <linux/of_address.h>
>> +#include <linux/of_irq.h>
>> +#include <linux/of_net.h>
>> +
>> +#include <mach/mux.h>
>>
>> #include <asm/irq.h>
>> #include <asm/page.h>
>> @@ -339,6 +345,9 @@ struct emac_priv {
>> u32 rx_addr_type;
>> atomic_t cur_tx;
>> const char *phy_id;
>> +#ifdef CONFIG_OF
>> + struct device_node *phy_node;
>> +#endif
>> struct phy_device *phydev;
>> spinlock_t lock;
>> /*platform specific members*/
>> @@ -1760,6 +1769,82 @@ static const struct net_device_ops emac_netdev_ops = {
>> #endif
>> };
>>
>> +#ifdef CONFIG_OF
>> +static struct emac_platform_data
>> + *davinci_emac_of_get_pdata(struct platform_device *pdev,
>> + struct emac_priv *priv)
>> +{
>> + struct device_node *np;
>> + struct device_node *pinmux_np;
>> + struct emac_platform_data *pdata = NULL;
>> + const u8 *mac_addr;
>> + u32 data;
>> + int ret;
>> + int version;
>> +
>> + np = pdev->dev.of_node;
>> + if (!np)
>> + goto nodata;
>> + else
>> + version = EMAC_VERSION_2;
>
> You could set pdata->version directly here.
done.
>> +
>> + pdata = pdev->dev.platform_data;
>> + if (!pdata) {
>> + pdata = devm_kzalloc(&pdev->dev, sizeof(*pdata), GFP_KERNEL);
>> + if (!pdata)
>> + goto nodata;
>> + }
>> + pdata->version = version;
>> +
>> + mac_addr = of_get_mac_address(np);
>> + if (mac_addr)
>> + memcpy(pdata->mac_addr, mac_addr, ETH_ALEN);
>> +
>> + ret = of_property_read_u32(np, "ti,davinci-ctrl-reg-offset", &data);
>> + if (!ret)
>> + pdata->ctrl_reg_offset = data;
>> +
>> + ret = of_property_read_u32(np, "ti,davinci-ctrl-mod-reg-offset",
>> + &data);
>> + if (!ret)
>> + pdata->ctrl_mod_reg_offset = data;
>> +
>> + ret = of_property_read_u32(np, "ti,davinci-ctrl-ram-offset", &data);
>> + if (!ret)
>> + pdata->ctrl_ram_offset = data;
>> +
>> + ret = of_property_read_u32(np, "ti,davinci-ctrl-ram-size", &data);
>> + if (!ret)
>> + pdata->ctrl_ram_size = data;
>> +
>> + ret = of_property_read_u32(np, "ti,davinci-rmii-en", &data);
>> + if (!ret)
>> + pdata->rmii_en = data;
>> +
>> + ret = of_property_read_u32(np, "ti,davinci-no-bd-ram", &data);
>> + if (!ret)
>> + pdata->no_bd_ram = data;
>> +
>> + priv->phy_node = of_parse_phandle(np, "phy-handle", 0);
>> + if (!priv->phy_node)
>> + pdata->phy_id = "";
>> +
>> + pinmux_np = of_parse_phandle(np, "pinmux-handle", 0);
>> + if (pinmux_np)
>> + davinci_cfg_reg_of(pinmux_np);
>
> This is a DaVinci specific pinmux function and this
> driver can be used in non-DaVinci platforms like C6x
> and OMAP. So, it will not be correct to call a DaVinci
> specific function here.
Ah, right!
> Can you drop the pinmux from this patch for now? On DaVinci,
Done ... Hmm.. so I think, I should drop this for all patches
from my patchset, right?
> for pinmux, we need to migrate to drivers/pinctrl/ as well.
Ah, I see ... take a look at it, maybe I find time to do here
something ... or do you know about work in progress here?
> Doing this will also make this patch independent of the rest
> of this series can even be merged separately. Can you please
> make these changes and resend just this patch?
Yep, I do some test with the changes you requested and resend
this patch ... do you prefer some tree, which I should use as
base?
>> +
>> + pdev->dev.platform_data = pdata;
>> +nodata:
>> + return pdata;
>> +}
>> +#else
>> +static struct emac_platform_data
>> + *davinci_emac_of_get_pdata(struct platform_device *pdev,
>> + struct emac_priv *priv)
>> +{
>> + return pdev->dev.platform_data;
>> +}
>> +#endif
>> /**
>> * davinci_emac_probe: EMAC device probe
>> * @pdev: The DaVinci EMAC device that we are removing
>> @@ -1803,7 +1888,7 @@ static int __devinit davinci_emac_probe(struct platform_device *pdev)
>>
>> spin_lock_init(&priv->lock);
>>
>> - pdata = pdev->dev.platform_data;
>> + pdata = davinci_emac_of_get_pdata(pdev, priv);
>> if (!pdata) {
>> dev_err(&pdev->dev, "no platform data\n");
>> rc = -ENODEV;
>> @@ -2013,6 +2098,12 @@ static const struct dev_pm_ops davinci_emac_pm_ops = {
>> .resume = davinci_emac_resume,
>> };
>>
>> +static const struct of_device_id davinci_emac_of_match[] = {
>> + {.compatible = "ti,davinci-dm6460-emac", },
>
> This needs to be ti,davinci-dm6467-emac as well.
Yep, fixed.
bye,
Heiko
--
DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
^ permalink raw reply
* [patch] isdn: remove duplicate NULL check
From: Dan Carpenter @ 2012-05-17 6:51 UTC (permalink / raw)
To: Karsten Keil
Cc: David Howells, Phil Carmody, netdev, kernel-janitors,
David S. Miller
We test both "!skb_out" and "skb_out" here which is duplicative and
causes a static checker warning. I considered that the intent might
have been to test "skb_in" but that's a valid pointer here.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
diff --git a/drivers/isdn/i4l/isdn_bsdcomp.c b/drivers/isdn/i4l/isdn_bsdcomp.c
index c59e8d2..8837ac5 100644
--- a/drivers/isdn/i4l/isdn_bsdcomp.c
+++ b/drivers/isdn/i4l/isdn_bsdcomp.c
@@ -612,7 +612,7 @@ static int bsd_compress(void *state, struct sk_buff *skb_in, struct sk_buff *skb
db->n_bits++;
/* If output length is too large then this is an incompressible frame. */
- if (!skb_out || (skb_out && skb_out->len >= skb_in->len)) {
+ if (!skb_out || skb_out->len >= skb_in->len) {
++db->incomp_count;
db->incomp_bytes += isize;
return 0;
^ permalink raw reply related
* all networking trees merged
From: David Miller @ 2012-05-17 7:18 UTC (permalink / raw)
To: netdev
Earlier this evening I merged:
Linus's tree --> net --> net-next
Just FYI...
^ permalink raw reply
* Re: [PATCH v3 4/7] ARM: davinci: net: davinci_emac: add OF support
From: Sekhar Nori @ 2012-05-17 7:21 UTC (permalink / raw)
To: hs
Cc: davinci-linux-open-source@linux.davincidsp.com,
linux-arm-kernel@lists.infradead.org,
devicetree-discuss@lists.ozlabs.org, netdev@vger.kernel.org,
Grant Likely, Wolfgang Denk, Anatoly Sivov
In-Reply-To: <4FB49B7B.60506@denx.de>
On 5/17/2012 12:02 PM, Heiko Schocher wrote:
> Nori, Sekhar wrote:
>> On Mon, Mar 05, 2012 at 16:40:01, Heiko Schocher wrote:
>>> +#ifdef CONFIG_OF
>>> +static struct emac_platform_data
>>> + *davinci_emac_of_get_pdata(struct platform_device *pdev,
>>> + struct emac_priv *priv)
>>> +{
>>> + struct device_node *np;
>>> + struct device_node *pinmux_np;
>>> + struct emac_platform_data *pdata = NULL;
>>> + const u8 *mac_addr;
>>> + u32 data;
>>> + int ret;
>>> + int version;
>>> +
>>> + np = pdev->dev.of_node;
>>> + if (!np)
>>> + goto nodata;
>>> + else
>>> + version = EMAC_VERSION_2;
>>
>> You could set pdata->version directly here.
>
> done.
Just noticed that pdata is not setup at this time. I guess you will be
moving around some code to do this.
>>> +
>>> + pinmux_np = of_parse_phandle(np, "pinmux-handle", 0);
>>> + if (pinmux_np)
>>> + davinci_cfg_reg_of(pinmux_np);
>>
>> This is a DaVinci specific pinmux function and this
>> driver can be used in non-DaVinci platforms like C6x
>> and OMAP. So, it will not be correct to call a DaVinci
>> specific function here.
>
> Ah, right!
>
>> Can you drop the pinmux from this patch for now? On DaVinci,
>
> Done ... Hmm.. so I think, I should drop this for all patches
> from my patchset, right?
Yes.
>
>> for pinmux, we need to migrate to drivers/pinctrl/ as well.
>
> Ah, I see ... take a look at it, maybe I find time to do here
> something ... or do you know about work in progress here?
There is no work in progress within TI. So, if you are interested in
taking a stab it it, it will be great.
>
>> Doing this will also make this patch independent of the rest
>> of this series can even be merged separately. Can you please
>> make these changes and resend just this patch?
>
> Yep, I do some test with the changes you requested and resend
> this patch ... do you prefer some tree, which I should use as
> base?
This one should ideally merge through the network subsystem so may be
base it on the net-next tree?
http://git.kernel.org/?p=linux/kernel/git/davem/net-next.git;a=summary
Thanks,
Sekhar
^ permalink raw reply
* Re: [Bridge] [PATCH] net/bridge/netfilter: Fix the randconfig warning
From: Bart De Schuymer @ 2012-05-17 7:21 UTC (permalink / raw)
To: Devendra Naga
Cc: Pablo Neira Ayuso, Patrick McHardy, Stephen Hemminger,
David S. Miller, netfilter-devel, netfilter, coreteam, bridge,
netdev
In-Reply-To: <1337023817-26132-1-git-send-email-devendra.aaru@gmail.com>
Op 14/05/2012 21:30, Devendra Naga schreef:
> when ran with make randconfig got
>
> warning: (BRIDGE_NF_EBTABLES) selects NETFILTER_XTABLES which has unmet direct dependencies (NET&& INET&& NETFILTER)
>
> added NET&& INET&& NETFILTER dependency to the BRIDGE_NF_EBTABLES
I really don't see why ebtables should depend on INET. To me the issue
lies in the fact that xtables depends on INET.
Bart
--
Bart De Schuymer
www.artinalgorithms.be
^ permalink raw reply
* Re: [PATCH net-next v2 0/2] 6lowpan: code updates
From: Alexander Smirnov @ 2012-05-17 7:31 UTC (permalink / raw)
To: David Miller; +Cc: netdev
In-Reply-To: <20120516.152801.915753046212672140.davem@davemloft.net>
Got it!
Damn, my fault, sorry for that.
Alex
2012/5/16 David Miller <davem@davemloft.net>:
> From: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
> Date: Wed, 16 May 2012 11:27:26 +0400
>
>> 6lowpan: rework data fetching from skb
>
> This is not what we told you to do.
>
> We told you that IF you were going to emit a warning message
> for the pskb_may_pull() failure condition, you should use
> WARN_ON_ONCE() so that it doesn't potentially flood the
> logs.
>
> But you must always, in every case, handle the error in some
> reasonable way, not just when WARN_ON_ONCE() does that initial
> one-and-only trigger.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox