* Re: BUG: unable to handle kernel paging request at 000041ed00000001
From: Arturas @ 2010-06-14 9:27 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1276504295.2478.35.camel@edumazet-laptop>
On Jun 14, 2010, at 11:31 AM, Eric Dumazet wrote:
> But your problem is about bridge, not bonding (see trace).
I want it for performance reason, not because of this bug.
Bridge isn't a bottleneck for me, but bonding may be and not to me only,
but for many people. I believe that performance gain would be more
than 1% on cpu? :-)
>
> And 2.6.34 wont accept such changes, its already released.
It can be as a separate patch or I can test 2.3.35 if it would accept
such change. I just need a stable kernel with good performance :-)
>
>> I also have another issue with NMI. On older machine with 5500 xeons i
>> have almost no overhead with nmi_watchdog enabled, but on this it is about twice.
>> without nmi enabled cpu peak average is 30%, and with nmi enabled i have 53%.
>> When traffic is not passing all cpus are idling at 100%.
>> Maybe overhead could be a little bit smaller? :-)
>>
>
> I am a bit lost here, NMI have litle to do with network stack ;)
May this be related to very recent cpu? As i understand NMI depends on CPU.
>
>
> Could you please test another patch ?
Applied, it's working correctly for now. If i'll get a warning i'll write you or maybe I
shouldn't get it if a patch is correct?
>
> Before calling sk_tx_queue_set(sk, queue_index); we should check if dst
> dev is current device.
^ permalink raw reply
* [PATCH 1/3] netxen: fix memory leaks in error path
From: Amit Kumar Salecha @ 2010-06-14 9:39 UTC (permalink / raw)
To: davem; +Cc: netdev, ameen.rahman
In-Reply-To: <1276508345-17070-1-git-send-email-amit.salecha@qlogic.com>
Fixes memory leak in error path when memory allocation
for adapter data structures fails.
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
---
drivers/net/netxen/netxen_nic_ctx.c | 3 ++-
drivers/net/netxen/netxen_nic_init.c | 4 ++--
2 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/net/netxen/netxen_nic_ctx.c b/drivers/net/netxen/netxen_nic_ctx.c
index f26e547..3a41b6a 100644
--- a/drivers/net/netxen/netxen_nic_ctx.c
+++ b/drivers/net/netxen/netxen_nic_ctx.c
@@ -629,7 +629,8 @@ int netxen_alloc_hw_resources(struct netxen_adapter *adapter)
if (addr == NULL) {
dev_err(&pdev->dev, "%s: failed to allocate tx desc ring\n",
netdev->name);
- return -ENOMEM;
+ err = -ENOMEM;
+ goto err_out_free;
}
tx_ring->desc_head = (struct cmd_desc_type0 *)addr;
diff --git a/drivers/net/netxen/netxen_nic_init.c b/drivers/net/netxen/netxen_nic_init.c
index 045a7c8..e08527f 100644
--- a/drivers/net/netxen/netxen_nic_init.c
+++ b/drivers/net/netxen/netxen_nic_init.c
@@ -218,7 +218,7 @@ int netxen_alloc_sw_resources(struct netxen_adapter *adapter)
if (cmd_buf_arr == NULL) {
dev_err(&pdev->dev, "%s: failed to allocate cmd buffer ring\n",
netdev->name);
- return -ENOMEM;
+ goto err_out;
}
memset(cmd_buf_arr, 0, TX_BUFF_RINGSIZE(tx_ring));
tx_ring->cmd_buf_arr = cmd_buf_arr;
@@ -230,7 +230,7 @@ int netxen_alloc_sw_resources(struct netxen_adapter *adapter)
if (rds_ring == NULL) {
dev_err(&pdev->dev, "%s: failed to allocate rds ring struct\n",
netdev->name);
- return -ENOMEM;
+ goto err_out;
}
recv_ctx->rds_rings = rds_ring;
--
1.6.0.2
^ permalink raw reply related
* [PATCH 2/3] netxen: fix rcv buffer leak
From: Amit Kumar Salecha @ 2010-06-14 9:39 UTC (permalink / raw)
To: davem; +Cc: netdev, ameen.rahman
In-Reply-To: <1276508345-17070-1-git-send-email-amit.salecha@qlogic.com>
Rcv producer should be read in spin-lock.
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
---
drivers/net/netxen/netxen_nic_init.c | 9 ++++++---
1 files changed, 6 insertions(+), 3 deletions(-)
diff --git a/drivers/net/netxen/netxen_nic_init.c b/drivers/net/netxen/netxen_nic_init.c
index e08527f..c865dda 100644
--- a/drivers/net/netxen/netxen_nic_init.c
+++ b/drivers/net/netxen/netxen_nic_init.c
@@ -1805,9 +1805,10 @@ netxen_post_rx_buffers(struct netxen_adapter *adapter, u32 ringid,
netxen_ctx_msg msg = 0;
struct list_head *head;
+ spin_lock(&rds_ring->lock);
+
producer = rds_ring->producer;
- spin_lock(&rds_ring->lock);
head = &rds_ring->free_list;
while (!list_empty(head)) {
@@ -1829,7 +1830,6 @@ netxen_post_rx_buffers(struct netxen_adapter *adapter, u32 ringid,
producer = get_next_index(producer, rds_ring->num_desc);
}
- spin_unlock(&rds_ring->lock);
if (count) {
rds_ring->producer = producer;
@@ -1853,6 +1853,8 @@ netxen_post_rx_buffers(struct netxen_adapter *adapter, u32 ringid,
NETXEN_RCV_PRODUCER_OFFSET), msg);
}
}
+
+ spin_unlock(&rds_ring->lock);
}
static void
@@ -1864,10 +1866,11 @@ netxen_post_rx_buffers_nodb(struct netxen_adapter *adapter,
int producer, count = 0;
struct list_head *head;
- producer = rds_ring->producer;
if (!spin_trylock(&rds_ring->lock))
return;
+ producer = rds_ring->producer;
+
head = &rds_ring->free_list;
while (!list_empty(head)) {
--
1.6.0.2
^ permalink raw reply related
* [PATCH 0/3]netxen: bug fixes
From: Amit Kumar Salecha @ 2010-06-14 9:39 UTC (permalink / raw)
To: davem; +Cc: netdev, ameen.rahman
Hi
Sending series of 3 bug fixes. Please apply them on net-2.6.
-Amit
^ permalink raw reply
* [PATCH 3/3] netxen: fix caching window register
From: Amit Kumar Salecha @ 2010-06-14 9:39 UTC (permalink / raw)
To: davem; +Cc: netdev, ameen.rahman
In-Reply-To: <1276508345-17070-1-git-send-email-amit.salecha@qlogic.com>
CRB window register is not per pci-func for NX3031,
so caching can result in incorrect values.
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
---
drivers/net/netxen/netxen_nic_hw.c | 4 ----
1 files changed, 0 insertions(+), 4 deletions(-)
diff --git a/drivers/net/netxen/netxen_nic_hw.c b/drivers/net/netxen/netxen_nic_hw.c
index 5c496f8..29d7b93 100644
--- a/drivers/net/netxen/netxen_nic_hw.c
+++ b/drivers/net/netxen/netxen_nic_hw.c
@@ -1159,9 +1159,6 @@ netxen_nic_pci_set_crbwindow_2M(struct netxen_adapter *adapter, ulong off)
window = CRB_HI(off);
- if (adapter->ahw.crb_win == window)
- return;
-
writel(window, addr);
if (readl(addr) != window) {
if (printk_ratelimit())
@@ -1169,7 +1166,6 @@ netxen_nic_pci_set_crbwindow_2M(struct netxen_adapter *adapter, ulong off)
"failed to set CRB window to %d off 0x%lx\n",
window, off);
}
- adapter->ahw.crb_win = window;
}
static void __iomem *
--
1.6.0.2
^ permalink raw reply related
* Re: [RFC][PATCH] Fix another namespace issue with devices assigned to classes
From: Kay Sievers @ 2010-06-14 9:39 UTC (permalink / raw)
To: Johannes Berg; +Cc: Eric W. Biederman, Greg KH, netdev
In-Reply-To: <1276507247.3926.13.camel@jlt3.sipsolutions.net>
On Mon, Jun 14, 2010 at 11:20, Johannes Berg <johannes@sipsolutions.net> wrote:
> On Mon, 2010-06-14 at 11:13 +0200, Kay Sievers wrote:
>
>> That would block the rmmod process until the resources are cleaned up,
>> wouldn't it?
>
> Yes, would that be so bad?
Sounds fine, if we can expect that nothing else can hold a reference
and may block rmmod for a very long time with this logic. Sysfs access
should be fine these days, Tejun changed this a while ago.
> It just needs a wait_for_bus_exit() function that the module calls in
> _exit?
Sounds worth trying, to see if that works as expected.
Kay
^ permalink raw reply
* Re: [PATCH] fragment: add fast path
From: Changli Gao @ 2010-06-14 10:15 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S. Miller, Alexey Kuznetsov, Pekka Savola (ipv6),
James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev
In-Reply-To: <1276499096.2478.25.camel@edumazet-laptop>
On Mon, Jun 14, 2010 at 3:04 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
> Concept of 'fast path' has changed over years. It used to be cpu
> instructions and cycles, its now number of memory transactions.
>
> The only thing we need to address are the cache lines we must bring into
> cpu caches, and keep code short.
>
> These days, one cache line miss -> more than one hundred instructions
> that could be done during cpu stall. cpu cycles are cheap if code
> already in instruction cache.
>
> Adding a test to avoid entering a NULL loop (no fragment is stored yet)
> just bloats the code, making it larger than necessary.
>
> You dont need the else branch :
>
> if (prev) {
> if (FRAG_CB(prev)->offset < offset) {
> next = NULL;
> goto found;
> }
> else {
> next = NULL;
> goto found;
> }
>
> Just write :
>
> next = NULL;
> if (prev && FRAG_CB(prev)->offset < offset)
> goto found;
>
Thanks:
I think the following code is better:
if (!prev || FRAG_CB(prev)->offset < offset) {
next = NULL;
goto found;
}
--
Regards,
Changli Gao(xiaosuo@gmail.com)
^ permalink raw reply
* Re: [PATCH 0/1] Bluetooth: hidp: Add support for hidraw HIDIOCGFEATURE and HIDIOCSFEATURE
From: Alan Ott @ 2010-06-14 11:45 UTC (permalink / raw)
To: Marcel Holtmann, David S. Miller, Jiri Kosina, Michael Poole,
Bastien Nocera
In-Reply-To: <4C155945.1030500@signal11.us>
On 06/13/2010 06:18 PM, Alan Ott wrote:
> 3. A blocking, synchronous GET_REPORT transfer was easy when I
> implemented this for USB because data is both sent and received as
> part of a single control transfer. Because of the nature of Bluetooth
> however, where it is viewed more as an asynchronous network device,
> and with hidraw allowing multiple handles to a single device to exist,
> there could be a race when two handles call the hidp_get_raw_report()
> function concurrently, requesting the same report. I've convinced
> myself that this is not a problem, because since both callers
> requested the same report, the worst that could happen is that one
> could get a report which is slightly out of date.
>
> Consider the following case:
> 1. Client 1 requests report (Userspace call to HIDIOCGFEATURE)
> 2. Client 2 requests report (Userspace call to HIDIOCGFEATURE)
> 3. Client 1's report is returned, and delivered to BOTH clients
> 4. Client 2's report is returned (and discarded)
>
> Note here that Client 1's report and Client 2's report are the same
> report, ie: they reflect the state of the same data on the device,
> just at different times. In this case, they are indeed exactly the
> same data, but consider this case:
> 1. Client 1 requests report (Userspace call to HIDIOCGFEATURE)
> 2. Client 2 SETS report (Userspace call to HIDIOCSFEATURE)
> 2. Client 2 requests report (Userspace call to HIDIOCGFEATURE)
> 3. Client 1's report is returned, and delivered to Clients 1 and 2
> 4. Client 2's report is returned
>
> In this case, client 2 receives OLD data (since it set new data, and
> the call to write the reports is currently not synchronous). To make
> writes synchronous, we'd run into the same problem, of two writes
> happening concurrently, and the 2nd one receiving the ACK from the
> first one.
>
> Alan.
>
I just remembered to look at the hidraw.c source, to see that the call
to the hid_get_raw_report() function pointer (which points to
hidp_get_raw_report()) is called with a global mutex held. I believe
this will prevent the race mentioned in #3 above in the case that all
clients are communicating with the device using hidraw. Of course, the
situation above could still occur if one of the clients represents an
actual driver (which isn't subject to the hidraw mutex).
^ permalink raw reply
* [PATCH net-next-2.6] net: NET_SKB_PAD should depend on L1_CACHE_BYTES
From: Eric Dumazet @ 2010-06-14 12:57 UTC (permalink / raw)
To: David Miller
Cc: alexander.h.duyck, jeffrey.t.kirsher, mingo, tglx, hpa, x86,
linux-kernel, netdev, gospo
In-Reply-To: <20100610.222006.242136751.davem@davemloft.net>
Le jeudi 10 juin 2010 à 22:20 -0700, David Miller a écrit :
> Eric, why don't we do that? Make NET_SKB_PAD's define L1_CACHE_BYTES.
>
> Reading the comments you added when the default value was changed to
> 64, this seems to even be your overall intent. :-)
Of course right you are ;)
Thanks !
[PATCH net-next-2.6] net: NET_SKB_PAD should depend on L1_CACHE_BYTES
In old kernels, NET_SKB_PAD was defined to 16.
Then commit d6301d3dd1c2 (net: Increase default NET_SKB_PAD to 32), and
commit 18e8c134f4e9 (net: Increase NET_SKB_PAD to 64 bytes) increased it
to 64.
While first patch was governed by network stack needs, second was more
driven by performance issues on current hardware. Real intent was to
align data on a cache line boundary.
So use max(32, L1_CACHE_BYTES) instead of 64, to be more generic.
Remove microblaze and powerpc own NET_SKB_PAD definitions.
Thanks to Alexander Duyck and David Miller for their comments.
Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
arch/microblaze/include/asm/system.h | 3 ---
arch/powerpc/include/asm/system.h | 3 ---
include/linux/skbuff.h | 8 +++++---
3 files changed, 5 insertions(+), 9 deletions(-)
diff --git a/arch/microblaze/include/asm/system.h b/arch/microblaze/include/asm/system.h
index 48c4f03..81e1f7d 100644
--- a/arch/microblaze/include/asm/system.h
+++ b/arch/microblaze/include/asm/system.h
@@ -101,10 +101,7 @@ extern struct dentry *of_debugfs_root;
* MicroBlaze doesn't handle unaligned accesses in hardware.
*
* Based on this we force the IP header alignment in network drivers.
- * We also modify NET_SKB_PAD to be a cacheline in size, thus maintaining
- * cacheline alignment of buffers.
*/
#define NET_IP_ALIGN 2
-#define NET_SKB_PAD L1_CACHE_BYTES
#endif /* _ASM_MICROBLAZE_SYSTEM_H */
diff --git a/arch/powerpc/include/asm/system.h b/arch/powerpc/include/asm/system.h
index a6297c6..6c294ac 100644
--- a/arch/powerpc/include/asm/system.h
+++ b/arch/powerpc/include/asm/system.h
@@ -515,11 +515,8 @@ __cmpxchg_local(volatile void *ptr, unsigned long old, unsigned long new,
* powers of 2 writes until it reaches sufficient alignment).
*
* Based on this we disable the IP header alignment in network drivers.
- * We also modify NET_SKB_PAD to be a cacheline in size, thus maintaining
- * cacheline alignment of buffers.
*/
#define NET_IP_ALIGN 0
-#define NET_SKB_PAD L1_CACHE_BYTES
#define cmpxchg64(ptr, o, n) \
({ \
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 122d083..ac74ee0 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1414,12 +1414,14 @@ static inline int skb_network_offset(const struct sk_buff *skb)
*
* Various parts of the networking layer expect at least 32 bytes of
* headroom, you should not reduce this.
- * With RPS, we raised NET_SKB_PAD to 64 so that get_rps_cpus() fetches span
- * a 64 bytes aligned block to fit modern (>= 64 bytes) cache line sizes
+ *
+ * Using max(32, L1_CACHE_BYTES) makes sense (especially with RPS)
+ * to reduce average number of cache lines per packet.
+ * get_rps_cpus() for example only access one 64 bytes aligned block :
* NET_IP_ALIGN(2) + ethernet_header(14) + IP_header(20/40) + ports(8)
*/
#ifndef NET_SKB_PAD
-#define NET_SKB_PAD 64
+#define NET_SKB_PAD max(32, L1_CACHE_BYTES)
#endif
extern int ___pskb_trim(struct sk_buff *skb, unsigned int len);
^ permalink raw reply related
* Re: [PATCH v2] net: deliver skbs on inactive slaves to exact matches
From: Michael S. Tsirkin @ 2010-06-14 13:21 UTC (permalink / raw)
To: John Fastabend; +Cc: fubar, davem, nhorman, bonding-devel, netdev
In-Reply-To: <20100603193011.4916.12354.stgit@jf-dev2-dcblab>
On Thu, Jun 03, 2010 at 12:30:11PM -0700, John Fastabend wrote:
> Currently, the accelerated receive path for VLAN's will
> drop packets if the real device is an inactive slave and
> is not one of the special pkts tested for in
> skb_bond_should_drop(). This behavior is different then
> the non-accelerated path and for pkts over a bonded vlan.
>
> For example,
>
> vlanx -> bond0 -> ethx
>
> will be dropped in the vlan path and not delivered to any
> packet handlers at all. However,
>
> bond0 -> vlanx -> ethx
>
> and
>
> bond0 -> ethx
>
> will be delivered to handlers that match the exact dev,
> because the VLAN path checks the real_dev which is not a
> slave and netif_recv_skb() doesn't drop frames but only
> delivers them to exact matches.
>
> This patch adds a sk_buff flag which is used for tagging
> skbs that would previously been dropped and allows the
> skb to continue to skb_netif_recv(). Here we add
> logic to check for the deliver_no_wcard flag and if it
> is set only deliver to handlers that match exactly. This
> makes both paths above consistent and gives pkt handlers
> a way to identify skbs that come from inactive slaves.
> Without this patch in some configurations skbs will be
> delivered to handlers with exact matches and in others
> be dropped out right in the vlan path.
>
> I have tested the following 4 configurations in failover modes
> and load balancing modes.
>
> # bond0 -> ethx
>
> # vlanx -> bond0 -> ethx
>
> # bond0 -> vlanx -> ethx
>
> # bond0 -> ethx
> |
> vlanx -> --
>
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
I am using qemu with both tap and slirp (userspace) networking.
This works fine under 2.6.35-rc2 but breaks under 2.6.35-rc3:
ssh over slirp stops working sometimes right away
and sometimes after a bit of use, connection times out.
Git bisect gave me this commit:
597a264b1a9c7e36d1728f677c66c5c1f7e3b837.
Reverting 597a264b1a9c7e36d1728f677c66c5c1f7e3b837 fixes the issue
for me.
I'm short for time now so didn't debug this further.
I opened a bugzilla to track this issue:
https://bugzilla.kernel.org/show_bug.cgi?id=16204
--
MST
^ permalink raw reply
* [PATCH] iproute: fix tc generating ipv6 priority filter
From: Petr Lautrbach @ 2010-06-14 13:36 UTC (permalink / raw)
To: shemminger; +Cc: netdev, Petr Lautrbach
This patch adds ipv6 filter priority/traffic class function
static int parse_ip6_class(int *argc_p, char ***argv_p, struct tc_u32_sel *sel)
shifting filter value to 5th bit and ignoring "at" as header position
is exactly given.
Signed-off-by: Petr Lautrbach <plautrba@redhat.com>
---
Hello,
According to [1], tc doesn't generate right filters for IPv6
"priority".
Priority or Traffic Classes is 8 bits starting at 5th bit
based on RFC 2460 [2]. There is parse_u8(&argc, &argv, sel, 4, 0)
function used in current version, which shifts filter by 4 bytes
instead of 4 bits.
[1] https://bugzilla.redhat.com/show_bug.cgi?id=584913
[2] http://www.faqs.org/rfcs/rfc2460.html
tc/f_u32.c | 39 ++++++++++++++++++++++++++++++++++++++-
1 files changed, 38 insertions(+), 1 deletions(-)
diff --git a/tc/f_u32.c b/tc/f_u32.c
index 4f5f74e..31e13b5 100644
--- a/tc/f_u32.c
+++ b/tc/f_u32.c
@@ -403,6 +403,43 @@ static int parse_ip6_addr(int *argc_p, char ***argv_p,
return res;
}
+static int parse_ip6_class(int *argc_p, char ***argv_p, struct tc_u32_sel *sel)
+{
+ int res = -1;
+ int argc = *argc_p;
+ char **argv = *argv_p;
+ __u32 key;
+ __u32 mask;
+ int off = 0;
+ int offmask = 0;
+
+ if (argc < 2)
+ return -1;
+
+ if (get_u32(&key, *argv, 0))
+ return -1;
+ argc--; argv++;
+
+ if (get_u32(&mask, *argv, 16))
+ return -1;
+ argc--; argv++;
+
+ if (key > 0xFF || mask > 0xFF)
+ return -1;
+
+ key <<= 20;
+ mask <<= 20;
+ key = htonl(key);
+ mask = htonl(mask);
+
+ if (res = pack_key(sel, key, mask, off, offmask) < 0)
+ return -1;
+
+ *argc_p = argc;
+ *argv_p = argv;
+ return 0;
+}
+
static int parse_ether_addr(int *argc_p, char ***argv_p,
struct tc_u32_sel *sel, int off)
{
@@ -522,7 +559,7 @@ static int parse_ip6(int *argc_p, char ***argv_p, struct tc_u32_sel *sel)
res = parse_ip6_addr(&argc, &argv, sel, 24);
} else if (strcmp(*argv, "priority") == 0) {
NEXT_ARG();
- res = parse_u8(&argc, &argv, sel, 4, 0);
+ res = parse_ip6_class(&argc, &argv, sel);
} else if (strcmp(*argv, "protocol") == 0) {
NEXT_ARG();
res = parse_u8(&argc, &argv, sel, 6, 0);
--
1.7.1
^ permalink raw reply related
* Re: [PATCH 8/8] af_unix: Allow connecting to sockets in other network namespaces.
From: Daniel Lezcano @ 2010-06-14 13:37 UTC (permalink / raw)
To: Eric W. Biederman
Cc: David Miller, Serge Hallyn, Linux Containers, netdev,
Pavel Emelyanov
In-Reply-To: <m11vcbgimj.fsf@fess.ebiederm.org>
On 06/13/2010 03:35 PM, Eric W. Biederman wrote:
> Remove the restriction that only allows connecting to a unix domain
> socket identified by unix path that is in the same network namespace.
>
> Crossing network namespaces is always tricky and we did not support
> this at first, because of a strict policy of don't mix the namespaces.
> Later after Pavel proposed this we did not support this because no one
> had performed the audit to make certain using unix domain sockets
> across namespaces is safe.
>
> What fundamentally makes connecting to af_unix sockets in other
> namespaces is safe is that you have to have the proper permissions on
> the unix domain socket inode that lives in the filesystem. If you
> want strict isolation you just don't create inodes where unfriendlys
> can get at them, or with permissions that allow unfriendlys to open
> them. All nicely handled for us by the mount namespace and other
> standard file system facilities.
>
> I looked through unix domain sockets and they are a very controlled
> environment so none of the work that goes on in dev_forward_skb to
> make crossing namespaces safe appears needed, we are not loosing
> controll of the skb and so do not need to set up the skb to look like
> it is comming in fresh from the outside world. Further the fields in
> struct unix_skb_parms should not have any problems crossing network
> namespaces.
>
> Now that we handle SCM_CREDENTIALS in a way that gives useable values
> across namespaces. There does not appear to be any operational
> problems with encouraging the use of unix domain sockets across
> containers either.
>
> Signed-off-by: Eric W. Biederman<ebiederm@xmission.com>
>
Acked-by: Daniel Lezcano <daniel.lezcano@free.fr>
^ permalink raw reply
* Re: [PATCH 5/8] af_netlink: Add needed scm_destroy after scm_send.
From: Daniel Lezcano @ 2010-06-14 13:37 UTC (permalink / raw)
To: Eric W. Biederman
Cc: David Miller, Serge Hallyn, Linux Containers, netdev,
Pavel Emelyanov
In-Reply-To: <m1ljajgiud.fsf@fess.ebiederm.org>
On 06/13/2010 03:31 PM, Eric W. Biederman wrote:
> scm_send occasionally allocates state in the scm_cookie, so I have
> modified netlink_sendmsg to guarantee that when scm_send succeeds
> scm_destory will be called to free that state.
>
> Signed-off-by: Eric W. Biederman<ebiederm@xmission.com>
> ---
>
Reviewed-by: Daniel Lezcano <daniel.lezcano@free.fr>
^ permalink raw reply
* Re: [PATCH 4/8] af_unix: Allow SO_PEERCRED to work across namespaces.
From: Daniel Lezcano @ 2010-06-14 13:37 UTC (permalink / raw)
To: Eric W. Biederman
Cc: David Miller, Serge Hallyn, Linux Containers, netdev,
Pavel Emelyanov
In-Reply-To: <m1r5kbgivt.fsf@fess.ebiederm.org>
On 06/13/2010 03:30 PM, Eric W. Biederman wrote:
> Use struct pid and struct cred to store the peer credentials on struct
> sock. This gives enough information to convert the peer credential
> information to a value relative to whatever namespace the socket is in
> at the time.
>
> This removes nasty surprises when using SO_PEERCRED on socket
> connetions where the processes on either side are in different pid and
> user namespaces.
>
> Signed-off-by: Eric W. Biederman<ebiederm@xmission.com>
>
Acked-by: Daniel Lezcano <daniel.lezcano@free.fr>
^ permalink raw reply
* Re: [PATCH v2] net: deliver skbs on inactive slaves to exact matches
From: Eric Dumazet @ 2010-06-14 13:38 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: John Fastabend, fubar, davem, nhorman, bonding-devel, netdev
In-Reply-To: <1276522483.2478.88.camel@edumazet-laptop>
Le lundi 14 juin 2010 à 15:34 +0200, Eric Dumazet a écrit :
> [PATCH] net: fix deliver_no_wcard regression on loopback device
>
> deliver_no_wcard is not being set in skb_copy_header.
> In the skb_cloned case it is not being cleared and
> may cause the skb to be dropped when the loopback device
> pushes it back up the stack.
>
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Oh I forgot :
Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
> ---
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 9f07e74..bcf2fa3 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -532,6 +532,7 @@ static void __copy_skb_header(struct sk_buff *new, const struct sk_buff *old)
> new->ip_summed = old->ip_summed;
> skb_copy_queue_mapping(new, old);
> new->priority = old->priority;
> + new->deliver_no_wcard = old->deliver_no_wcard;
> #if defined(CONFIG_IP_VS) || defined(CONFIG_IP_VS_MODULE)
> new->ipvs_property = old->ipvs_property;
> #endif
>
^ permalink raw reply
* Re: [PATCH v2] net: deliver skbs on inactive slaves to exact matches
From: Eric Dumazet @ 2010-06-14 13:34 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: John Fastabend, fubar, davem, nhorman, bonding-devel, netdev
In-Reply-To: <20100614132120.GA24785@redhat.com>
From: John Fastabend <john.r.fastabend@intel.com>
Le lundi 14 juin 2010 à 16:21 +0300, Michael S. Tsirkin a écrit :
> On Thu, Jun 03, 2010 at 12:30:11PM -0700, John Fastabend wrote:
> > Currently, the accelerated receive path for VLAN's will
> > drop packets if the real device is an inactive slave and
> > is not one of the special pkts tested for in
> > skb_bond_should_drop(). This behavior is different then
> > the non-accelerated path and for pkts over a bonded vlan.
> >
> > For example,
> >
> > vlanx -> bond0 -> ethx
> >
> > will be dropped in the vlan path and not delivered to any
> > packet handlers at all. However,
> >
> > bond0 -> vlanx -> ethx
> >
> > and
> >
> > bond0 -> ethx
> >
> > will be delivered to handlers that match the exact dev,
> > because the VLAN path checks the real_dev which is not a
> > slave and netif_recv_skb() doesn't drop frames but only
> > delivers them to exact matches.
> >
> > This patch adds a sk_buff flag which is used for tagging
> > skbs that would previously been dropped and allows the
> > skb to continue to skb_netif_recv(). Here we add
> > logic to check for the deliver_no_wcard flag and if it
> > is set only deliver to handlers that match exactly. This
> > makes both paths above consistent and gives pkt handlers
> > a way to identify skbs that come from inactive slaves.
> > Without this patch in some configurations skbs will be
> > delivered to handlers with exact matches and in others
> > be dropped out right in the vlan path.
> >
> > I have tested the following 4 configurations in failover modes
> > and load balancing modes.
> >
> > # bond0 -> ethx
> >
> > # vlanx -> bond0 -> ethx
> >
> > # bond0 -> vlanx -> ethx
> >
> > # bond0 -> ethx
> > |
> > vlanx -> --
> >
> > Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>
> I am using qemu with both tap and slirp (userspace) networking.
> This works fine under 2.6.35-rc2 but breaks under 2.6.35-rc3:
> ssh over slirp stops working sometimes right away
> and sometimes after a bit of use, connection times out.
>
> Git bisect gave me this commit:
> 597a264b1a9c7e36d1728f677c66c5c1f7e3b837.
>
> Reverting 597a264b1a9c7e36d1728f677c66c5c1f7e3b837 fixes the issue
> for me.
>
> I'm short for time now so didn't debug this further.
> I opened a bugzilla to track this issue:
> https://bugzilla.kernel.org/show_bug.cgi?id=16204
>
A fix is already there, and bug is already opened multiple times.
http://lkml.org/lkml/2010/6/13/155
[PATCH] net: fix deliver_no_wcard regression on loopback device
deliver_no_wcard is not being set in skb_copy_header.
In the skb_cloned case it is not being cleared and
may cause the skb to be dropped when the loopback device
pushes it back up the stack.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
---
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 9f07e74..bcf2fa3 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -532,6 +532,7 @@ static void __copy_skb_header(struct sk_buff *new, const struct sk_buff *old)
new->ip_summed = old->ip_summed;
skb_copy_queue_mapping(new, old);
new->priority = old->priority;
+ new->deliver_no_wcard = old->deliver_no_wcard;
#if defined(CONFIG_IP_VS) || defined(CONFIG_IP_VS_MODULE)
new->ipvs_property = old->ipvs_property;
#endif
^ permalink raw reply related
* potential race in virtio ring?
From: Michael S. Tsirkin @ 2010-06-14 13:59 UTC (permalink / raw)
To: virtualization, Rusty Russell, Jiri Pirko, Shirley Ma, netdev,
linux-kernel
Hi!
I was going over the vring code and noticed, that
the ring has this check:
irqreturn_t vring_interrupt(int irq, void *_vq)
{
struct vring_virtqueue *vq = to_vvq(_vq);
if (!more_used(vq)) {
pr_debug("virtqueue interrupt with no work for %p\n", vq);
return IRQ_NONE;
static inline bool more_used(const struct vring_virtqueue *vq)
{
return vq->last_used_idx != vq->vring.used->idx;
}
My concern is that with virtio net, more_used is called
on a CPU different from the one that polls the vq.
This might mean that last_used_idx value might be stale.
Could this lead to a missed interrupt?
Thanks,
--
MST
^ permalink raw reply
* Re: [PATCH net-next-2.6] ip_frag: Remove some atomic ops
From: Shan Wei @ 2010-06-14 14:01 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David Miller, netdev, Patrick McHardy, netfilter-devel
In-Reply-To: <1276506144.2478.40.camel@edumazet-laptop>
Eric Dumazet wrote, at 06/14/2010 05:02 PM:
> Instead of doing one atomic operation per frag, we can factorize them.
>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
IPv6 netfilter has implemented owns queue to manage/reassemble defragments.
So, you miss this one.
[PATCH 1/2] netfilter: defrag: remove one redundant atomic ops
Instead of doing one atomic operation per frag, we can factorize them.
Reported from Eric Dumazet.
Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
---
net/ipv6/netfilter/nf_conntrack_reasm.c | 3 +--
1 files changed, 1 insertions(+), 2 deletions(-)
diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
index 6fb8901..bc5b86d 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -442,7 +442,6 @@ nf_ct_frag6_reasm(struct nf_ct_frag6_queue *fq, struct net_device *dev)
skb_shinfo(head)->frag_list = head->next;
skb_reset_transport_header(head);
skb_push(head, head->data - skb_network_header(head));
- atomic_sub(head->truesize, &nf_init_frags.mem);
for (fp=head->next; fp; fp = fp->next) {
head->data_len += fp->len;
@@ -452,8 +451,8 @@ nf_ct_frag6_reasm(struct nf_ct_frag6_queue *fq, struct net_device *dev)
else if (head->ip_summed == CHECKSUM_COMPLETE)
head->csum = csum_add(head->csum, fp->csum);
head->truesize += fp->truesize;
- atomic_sub(fp->truesize, &nf_init_frags.mem);
}
+ atomic_sub(head->truesize, &nf_init_frags.mem);
head->next = NULL;
head->dev = dev;
^ permalink raw reply related
* Re: [PATCH net-next-2.6] ipfrag : frag_kfree_skb() cleanup
From: Shan Wei @ 2010-06-14 14:01 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David Miller, netdev, Patrick McHardy, netfilter-devel
In-Reply-To: <1276507363.2478.43.camel@edumazet-laptop>
Eric Dumazet wrote, at 06/14/2010 05:22 PM:
> Third param (work) is unused, remove it.
>
> Remove __inline__ and inline qualifiers.
>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
we also need to fix IPv6 netfilter.
[PATCH 2/2] netfilter: defrag: kill unused work parameter of frag_kfree_skb()
The parameter (work) is unused, remove it.
Reported from Eric Dumazet.
Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
---
net/ipv6/netfilter/nf_conntrack_reasm.c | 6 ++----
1 files changed, 2 insertions(+), 4 deletions(-)
diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
index bc5b86d..9254008 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -114,10 +114,8 @@ static void nf_skb_free(struct sk_buff *skb)
}
/* Memory Tracking Functions. */
-static inline void frag_kfree_skb(struct sk_buff *skb, unsigned int *work)
+static void frag_kfree_skb(struct sk_buff *skb)
{
- if (work)
- *work -= skb->truesize;
atomic_sub(skb->truesize, &nf_init_frags.mem);
nf_skb_free(skb);
kfree_skb(skb);
@@ -335,7 +333,7 @@ static int nf_ct_frag6_queue(struct nf_ct_frag6_queue *fq, struct sk_buff *skb,
fq->q.fragments = next;
fq->q.meat -= free_it->len;
- frag_kfree_skb(free_it, NULL);
+ frag_kfree_skb(free_it);
}
}
--
1.6.3.3
^ permalink raw reply related
* Re: mpd client timeouts (bisected) 2.6.35-rc3
From: Michael S. Tsirkin @ 2010-06-14 14:13 UTC (permalink / raw)
To: David Miller
Cc: john.r.fastabend, markus, linux-kernel, netdev, yanmin_zhang,
alex.shi, tim.c.chen
In-Reply-To: <20100613.171318.193707256.davem@davemloft.net>
On Sun, Jun 13, 2010 at 05:13:18PM -0700, David Miller wrote:
> From: John Fastabend <john.r.fastabend@intel.com>
> Date: Sun, 13 Jun 2010 13:36:30 -0700
>
> > Needed to set the wcard bit in copy_skb_header otherwise it will not
> > be cleared when called from skb_clone. Which then hits the loopback
> > device gets pushed into the rx path and is eventually dropped. The
> > following patch fixes this. Hopefully, this is easy and fast enough
> > for you Dave.
> >
> >
> > [PATCH] net: fix deliver_no_wcard regression on loopback device
> >
> > deliver_no_wcard is not being set in skb_copy_header.
> > In the skb_cloned case it is not being cleared and
> > may cause the skb to be dropped when the loopback device
> > pushes it back up the stack.
> >
> > Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>
> Applied, but your email client corrupted this patch in many
> ways. Please correct this for next time, thanks.
FWIW:
Tested-by: Michael S. Tsirkin <mst@redhat.com>
--
MST
^ permalink raw reply
* Re: [PATCH v2] net: deliver skbs on inactive slaves to exact matches
From: Michael S. Tsirkin @ 2010-06-14 14:10 UTC (permalink / raw)
To: Eric Dumazet; +Cc: John Fastabend, fubar, davem, nhorman, bonding-devel, netdev
In-Reply-To: <1276522708.2478.89.camel@edumazet-laptop>
On Mon, Jun 14, 2010 at 03:38:28PM +0200, Eric Dumazet wrote:
> Le lundi 14 juin 2010 à 15:34 +0200, Eric Dumazet a écrit :
>
> > [PATCH] net: fix deliver_no_wcard regression on loopback device
> >
> > deliver_no_wcard is not being set in skb_copy_header.
> > In the skb_cloned case it is not being cleared and
> > may cause the skb to be dropped when the loopback device
> > pushes it back up the stack.
> >
> > Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> > Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
>
> Oh I forgot :
>
> Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Grr. Could have saved myself a bit of time if I guessed
it's related.
> > ---
> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > index 9f07e74..bcf2fa3 100644
> > --- a/net/core/skbuff.c
> > +++ b/net/core/skbuff.c
> > @@ -532,6 +532,7 @@ static void __copy_skb_header(struct sk_buff *new, const struct sk_buff *old)
> > new->ip_summed = old->ip_summed;
> > skb_copy_queue_mapping(new, old);
> > new->priority = old->priority;
> > + new->deliver_no_wcard = old->deliver_no_wcard;
> > #if defined(CONFIG_IP_VS) || defined(CONFIG_IP_VS_MODULE)
> > new->ipvs_property = old->ipvs_property;
> > #endif
> >
>
^ permalink raw reply
* Re: [PATCH net-next-2.6] ip_frag: Remove some atomic ops
From: Eric Dumazet @ 2010-06-14 14:18 UTC (permalink / raw)
To: Shan Wei; +Cc: David Miller, netdev, Patrick McHardy, netfilter-devel
In-Reply-To: <4C163622.7080003@cn.fujitsu.com>
Le lundi 14 juin 2010 à 22:01 +0800, Shan Wei a écrit :
> Eric Dumazet wrote, at 06/14/2010 05:02 PM:
> > Instead of doing one atomic operation per frag, we can factorize them.
> >
> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
>
> IPv6 netfilter has implemented owns queue to manage/reassemble defragments.
> So, you miss this one.
>
Not exactly missed, its only a different thing :)
I prefer to separate if possible net patches (David) and netfilter ones
(Patrick), because of delay between git trees.
> [PATCH 1/2] netfilter: defrag: remove one redundant atomic ops
>
> Instead of doing one atomic operation per frag, we can factorize them.
> Reported from Eric Dumazet.
>
> Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
> net/ipv6/netfilter/nf_conntrack_reasm.c | 3 +--
> 1 files changed, 1 insertions(+), 2 deletions(-)
>
> diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
> index 6fb8901..bc5b86d 100644
> --- a/net/ipv6/netfilter/nf_conntrack_reasm.c
> +++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
> @@ -442,7 +442,6 @@ nf_ct_frag6_reasm(struct nf_ct_frag6_queue *fq, struct net_device *dev)
> skb_shinfo(head)->frag_list = head->next;
> skb_reset_transport_header(head);
> skb_push(head, head->data - skb_network_header(head));
> - atomic_sub(head->truesize, &nf_init_frags.mem);
>
> for (fp=head->next; fp; fp = fp->next) {
> head->data_len += fp->len;
> @@ -452,8 +451,8 @@ nf_ct_frag6_reasm(struct nf_ct_frag6_queue *fq, struct net_device *dev)
> else if (head->ip_summed == CHECKSUM_COMPLETE)
> head->csum = csum_add(head->csum, fp->csum);
> head->truesize += fp->truesize;
> - atomic_sub(fp->truesize, &nf_init_frags.mem);
> }
> + atomic_sub(head->truesize, &nf_init_frags.mem);
>
> head->next = NULL;
> head->dev = dev;
^ permalink raw reply
* Re: [PATCH net-next-2.6] ipfrag : frag_kfree_skb() cleanup
From: Eric Dumazet @ 2010-06-14 14:20 UTC (permalink / raw)
To: Shan Wei; +Cc: David Miller, netdev, Patrick McHardy, netfilter-devel
In-Reply-To: <4C163643.1080906@cn.fujitsu.com>
Le lundi 14 juin 2010 à 22:01 +0800, Shan Wei a écrit :
> Eric Dumazet wrote, at 06/14/2010 05:22 PM:
> > Third param (work) is unused, remove it.
> >
> > Remove __inline__ and inline qualifiers.
> >
> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
>
> we also need to fix IPv6 netfilter.
>
well, 'fix' is not appropriate, there is no bug ;)
> [PATCH 2/2] netfilter: defrag: kill unused work parameter of frag_kfree_skb()
>
> The parameter (work) is unused, remove it.
> Reported from Eric Dumazet.
>
> Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
> net/ipv6/netfilter/nf_conntrack_reasm.c | 6 ++----
> 1 files changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
> index bc5b86d..9254008 100644
> --- a/net/ipv6/netfilter/nf_conntrack_reasm.c
> +++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
> @@ -114,10 +114,8 @@ static void nf_skb_free(struct sk_buff *skb)
> }
>
> /* Memory Tracking Functions. */
> -static inline void frag_kfree_skb(struct sk_buff *skb, unsigned int *work)
> +static void frag_kfree_skb(struct sk_buff *skb)
> {
> - if (work)
> - *work -= skb->truesize;
> atomic_sub(skb->truesize, &nf_init_frags.mem);
> nf_skb_free(skb);
> kfree_skb(skb);
> @@ -335,7 +333,7 @@ static int nf_ct_frag6_queue(struct nf_ct_frag6_queue *fq, struct sk_buff *skb,
> fq->q.fragments = next;
>
> fq->q.meat -= free_it->len;
> - frag_kfree_skb(free_it, NULL);
> + frag_kfree_skb(free_it);
> }
> }
>
^ permalink raw reply
* Re: [PATCH net-next-2.6] ip_frag: Remove some atomic ops
From: Patrick McHardy @ 2010-06-14 14:30 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Shan Wei, David Miller, netdev, netfilter-devel
In-Reply-To: <1276525084.2478.92.camel@edumazet-laptop>
Eric Dumazet wrote:
> Le lundi 14 juin 2010 à 22:01 +0800, Shan Wei a écrit :
>
>> [PATCH 1/2] netfilter: defrag: remove one redundant atomic ops
>>
>> Instead of doing one atomic operation per frag, we can factorize them.
>> Reported from Eric Dumazet.
>>
>> Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
>>
>
> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
>
Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH net-next-2.6] ipfrag : frag_kfree_skb() cleanup
From: Patrick McHardy @ 2010-06-14 14:32 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Shan Wei, David Miller, netdev, netfilter-devel
In-Reply-To: <1276525212.2478.93.camel@edumazet-laptop>
Eric Dumazet wrote:
> Le lundi 14 juin 2010 à 22:01 +0800, Shan Wei a écrit :
>
>> [PATCH 2/2] netfilter: defrag: kill unused work parameter of frag_kfree_skb()
>>
>> The parameter (work) is unused, remove it.
>> Reported from Eric Dumazet.
>>
>> Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
>>
>
> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
>
>
Also applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox