* Why tcp_sacktag_walk specially process next_dup?
From: Yi Li @ 2012-12-25 10:35 UTC (permalink / raw)
To: ilpo.jarvinen; +Cc: netdev
Hi Ilpo,
I am a kernel newbie, maybe this question is simple.
If you have some free time, could you help me ?
I am reading your commit
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=68f8353b480e5f2e136c38a511abdbb88eaa8ce2,
through this code path:
tcp_sacktag_write_queue() {
if (tcp_sack_cache_ok(tp, cache) && !dup_sack &&
after(end_seq, cache->start_seq)) {
/* Head todo? */
if (before(start_seq, cache->start_seq)) {
skb = tcp_sacktag_skip(skb, sk, &state,
start_seq);
skb = tcp_sacktag_walk(skb, sk, next_dup,
&state,
start_seq,
cache->start_seq,
dup_sack);
}
}
and when we come to tcp_sacktag_walk(), comparing the current processing
sack block
with cache, we have: start_seq < cache->start_seq, and we now need to
process the
bytes between (start_seq, cache->start_seq) in tcp_write_queue.
But in tcp_sacktag_walk(), why we first check the seqence space in
next_dup ?
I know this is about D-SACK, and I have read the rfc2883, but I am still
confused.
I have some questions:
1. Why we introduce a next_dup variable in SACK processing, is it better
for performance optimization?
As there is dup_sack variable, will this pre-processing of sack
block be mixed with dup_sack ?
2. What does this test statement means in tcp_sacktag_walk:
if ((next_dup != NULL) &&
before(TCP_SKB_CB(skb)->seq, next_dup->end_seq)) {
---------------------> A
in_sack = tcp_match_skb_to_sack(sk, skb,
next_dup->start_seq,
next_dup->end_seq);
if (in_sack > 0)
dup_sack = true;
}
as far as i know, if tcp_skb_pcout(skb)>1, this condition maybe exist:
skb->seq < current_sack_block.start_seq <
current_sack_block.end_seq < next_dup->start_seq < next_dup->end_seq.
So, I do not understand what the code A really does.
^ permalink raw reply
* Re: kernel panic when running /etc/init.d/iptables restart
From: canqun zhang @ 2012-12-25 10:45 UTC (permalink / raw)
To: Gao feng
Cc: Patrick McHardy, netfilter-devel, netfilter, linux-kernel,
netdev@vger.kernel.org
In-Reply-To: <50D965F9.7090007@cn.fujitsu.com>
Thanks for your suggestion,i will modify this patch and take tests.
2012/12/25 Gao feng <gaofeng@cn.fujitsu.com>:
> On 2012/12/25 15:25, canqun zhang wrote:
>> Hi Gao feng
>> The stack information is as follows. The kenel will panic because the
>> nf_ct_destroy is NULL.
>>
>> Reproduction:
>> (1) starting a lxc container
>> (2) iptables -t nat -A POSTROUTING -s 10.48.254.18 -o eth1 -j
>> MASQUERADE (run it on host machine)
>> (3) /etc/ini.d/iptables save (run it on host machine)
>> (4)/etc/init.d/iptables restart (run it on host machine)
>>
>
> Thanks!
> It seems that nf_conntrack_l[3,4]proto_unregister doesn't make sure
> nf_conns of the proto being destroyed.
>
> If I'm right, there is another problem even your fix this panic problem.
> the l3,14proto will be unregistered before all of it's nf_conns being destroyed.
> So even nf_ct_destroy is not NULL,in destroy_conntrack we are not able to
> find the right l4proto,the l4proto->destroy will be incorrect.resources will
> not be released correctly.
>
> So I think the root problem is we do register/unregister, set/unset both on the
> first net (init_net), Maybe it's better to do register set on the first net, and
> do unregister unset on the last net.
^ permalink raw reply
* Re: [PATCH 19/29] batman-adv: fix random jitter calculation
From: Antonio Quartulli @ 2012-12-25 11:26 UTC (permalink / raw)
To: Akinobu Mita
Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, Simon Wunderlich,
akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, Marek Lindner,
David S. Miller
In-Reply-To: <1356315256-6572-20-git-send-email-akinobu.mita-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
[-- Attachment #1: Type: text/plain, Size: 1372 bytes --]
On Mon, Dec 24, 2012 at 11:14:06AM +0900, Akinobu Mita wrote:
> batadv_iv_ogm_emit_send_time() attempts to calculates a random integer
> in the range of 'orig_interval +- BATADV_JITTER' by the below lines.
>
> msecs = atomic_read(&bat_priv->orig_interval) - BATADV_JITTER;
> msecs += (random32() % 2 * BATADV_JITTER);
>
> But it actually gets 'orig_interval' or 'orig_interval - BATADV_JITTER'
> because '%' and '*' have same precedence and associativity is
> left-to-right.
>
> This adds the parentheses at the appropriate position so that it matches
> original intension.
>
> Signed-off-by: Akinobu Mita <akinobu.mita-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Cc: Marek Lindner <lindner_marek-LWAfsSFWpa4@public.gmane.org>
> Cc: Simon Wunderlich <siwu-MaAgPAbsBIVS8oHt8HbXEIQuADTiUCJX@public.gmane.org>
> Cc: Antonio Quartulli <ordex-GaUfNO9RBHfsrOwW+9ziJQ@public.gmane.org>
> Cc: b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r@public.gmane.org
> Cc: "David S. Miller" <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
> Cc: netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> ---
Acked-by: Antonio Quartulli <ordex-GaUfNO9RBHfsrOwW+9ziJQ@public.gmane.org>
But I would suggest to apply this change to net, since it is a fix.
Cheers,
--
Antonio Quartulli
..each of us alone is worth nothing..
Ernesto "Che" Guevara
[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply
* Re: [PATCH 20/29] batman-adv: rename random32() to prandom_u32()
From: Antonio Quartulli @ 2012-12-25 11:30 UTC (permalink / raw)
To: Akinobu Mita
Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, Simon Wunderlich,
akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, Marek Lindner,
David S. Miller
In-Reply-To: <1356315256-6572-21-git-send-email-akinobu.mita-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
[-- Attachment #1: Type: text/plain, Size: 832 bytes --]
On Mon, Dec 24, 2012 at 11:14:07AM +0900, Akinobu Mita wrote:
> Use more preferable function name which implies using a pseudo-random
> number generator.
>
> Signed-off-by: Akinobu Mita <akinobu.mita-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Cc: Marek Lindner <lindner_marek-LWAfsSFWpa4@public.gmane.org>
> Cc: Simon Wunderlich <siwu-MaAgPAbsBIVS8oHt8HbXEIQuADTiUCJX@public.gmane.org>
> Cc: Antonio Quartulli <ordex-GaUfNO9RBHfsrOwW+9ziJQ@public.gmane.org>
> Cc: b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r@public.gmane.org
> Cc: "David S. Miller" <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
> Cc: netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Acked-by: Antonio Quartulli <ordex-GaUfNO9RBHfsrOwW+9ziJQ@public.gmane.org>
--
Antonio Quartulli
..each of us alone is worth nothing..
Ernesto "Che" Guevara
[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply
* Re: [PATCH 28/29] net/: rename net_random() to prandom_u32()
From: Akinobu Mita @ 2012-12-25 11:47 UTC (permalink / raw)
To: Neil Horman
Cc: linux-kernel, akpm, Jesse Gross, Venkat Venkatsubra,
Vlad Yasevich, Sridhar Samudrala, Steffen Klassert, Herbert Xu,
David S. Miller, linux-sctp, dev, netdev
In-Reply-To: <20121225002149.GA5235@neilslaptop.think-freely.org>
2012/12/25 Neil Horman <nhorman@tuxdriver.com>:
> On Mon, Dec 24, 2012 at 11:14:15AM +0900, Akinobu Mita wrote:
>> Use more preferable function name which implies using a pseudo-random
>> number generator.
>>
>> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
>> Cc: Jesse Gross <jesse@nicira.com>
>> Cc: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
>> Cc: Vlad Yasevich <vyasevich@gmail.com>
>> Cc: Sridhar Samudrala <sri@us.ibm.com>
>> Cc: Neil Horman <nhorman@tuxdriver.com>
>> Cc: Steffen Klassert <steffen.klassert@secunet.com>
>> Cc: Herbert Xu <herbert@gondor.apana.org.au>
>> Cc: "David S. Miller" <davem@davemloft.net>
>> Cc: linux-sctp@vger.kernel.org
>> Cc: dev@openvswitch.org
>> Cc: netdev@vger.kernel.org
>> ---
>> include/net/red.h | 2 +-
>> net/802/garp.c | 2 +-
>> net/openvswitch/actions.c | 2 +-
>> net/rds/bind.c | 2 +-
>> net/sctp/socket.c | 2 +-
>> net/xfrm/xfrm_state.c | 2 +-
>> 6 files changed, 6 insertions(+), 6 deletions(-)
>>
> I'm largely indifferent to this patch, but I kind of feel like its just churn.
> Whats the real advantage in making this change? I grant that it clearly
> indicates the type of random number generator we're using at a given call site,
> But for those using net_random, you probably don't care too much about
> the source of your random bits. If you did really want true random vs.
> pseudo-random data, you need to explicitly use the right call. You're previous
> patch series did good cleanup on differentiating the different random calls, but
> this just seems like its removing what is otherwise useful indirection.
I overlooked the importance of net_random() indirection.
Thanks for the feedback. I'll leave all net_random() callers as-is in
the next version.
^ permalink raw reply
* Re: [PATCH 1/2] IB/rds: Correct ib_api use with gs_dma_address/sg_dma_len
From: Bart Van Assche @ 2012-12-25 12:10 UTC (permalink / raw)
To: Mike Marciniszyn
Cc: venkat.x.venkatsubra-QHcLZuEGTsvQT0dZR+AlfA,
linux-rdma-u79uwXL29TY76Z2rM5mHXA, roland-DgEjT+Ai2ygdnm+yROfE0A,
rds-devel-N0ozoZBvEnrZJqsBc5GL+g, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20121221180149.23716.54707.stgit-K+u1se/DcYrLESAwzcoQNrvm/XP+8Wra@public.gmane.org>
On 12/21/12 19:01, Mike Marciniszyn wrote:
> 0b088e00 ("RDS: Use page_remainder_alloc() for recv bufs")
> added uses of sg_dma_len() and sg_dma_address(). This makes
> RDS DOA with the qib driver.
>
> IB ulps should use ib_sg_dma_len() and ib_sg_dma_address
> respectively since some HCAs overload ib_sg_dma* operations.
>
> Signed-off-by: Mike Marciniszyn <mike.marciniszyn-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> ---
> net/rds/ib_recv.c | 9 ++++++---
> 1 files changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/net/rds/ib_recv.c b/net/rds/ib_recv.c
> index 8c5bc85..8eb9501 100644
> --- a/net/rds/ib_recv.c
> +++ b/net/rds/ib_recv.c
> @@ -339,8 +339,8 @@ static int rds_ib_recv_refill_one(struct rds_connection *conn,
> sge->length = sizeof(struct rds_header);
>
> sge = &recv->r_sge[1];
> - sge->addr = sg_dma_address(&recv->r_frag->f_sg);
> - sge->length = sg_dma_len(&recv->r_frag->f_sg);
> + sge->addr = ib_sg_dma_address(ic->i_cm_id->device, &recv->r_frag->f_sg);
> + sge->length = ib_sg_dma_len(ic->i_cm_id->device, &recv->r_frag->f_sg);
>
> ret = 0;
> out:
> @@ -381,7 +381,10 @@ void rds_ib_recv_refill(struct rds_connection *conn, int prefill)
> ret = ib_post_recv(ic->i_cm_id->qp, &recv->r_wr, &failed_wr);
> rdsdebug("recv %p ibinc %p page %p addr %lu ret %d\n", recv,
> recv->r_ibinc, sg_page(&recv->r_frag->f_sg),
> - (long) sg_dma_address(&recv->r_frag->f_sg), ret);
> + (long) ib_sg_dma_address(
> + ic->i_cm_id->device,
> + &recv->r_frag->f_sg),
> + ret);
> if (ret) {
> rds_ib_conn_error(conn, "recv post on "
> "%pI4 returned %d, disconnecting and "
The above patch will slow down RDS for anyone who is not using QLogic
HCA's, isn' it ? How about replacing the above patch with the
(untested) patch below ?
diff --git a/drivers/infiniband/hw/ehca/ehca_mrmw.c b/drivers/infiniband/hw/ehca/ehca_mrmw.c
index 8784486..ec9adf8 100644
--- a/drivers/infiniband/hw/ehca/ehca_mrmw.c
+++ b/drivers/infiniband/hw/ehca/ehca_mrmw.c
@@ -2588,16 +2588,6 @@ static void ehca_dma_unmap_sg(struct ib_device *dev, struct scatterlist *sg,
/* This is only a stub; nothing to be done here */
}
-static u64 ehca_dma_address(struct ib_device *dev, struct scatterlist *sg)
-{
- return sg->dma_address;
-}
-
-static unsigned int ehca_dma_len(struct ib_device *dev, struct scatterlist *sg)
-{
- return sg->length;
-}
-
static void ehca_dma_sync_single_for_cpu(struct ib_device *dev, u64 addr,
size_t size,
enum dma_data_direction dir)
@@ -2650,8 +2640,6 @@ struct ib_dma_mapping_ops ehca_dma_mapping_ops = {
.unmap_page = ehca_dma_unmap_page,
.map_sg = ehca_dma_map_sg,
.unmap_sg = ehca_dma_unmap_sg,
- .dma_address = ehca_dma_address,
- .dma_len = ehca_dma_len,
.sync_single_for_cpu = ehca_dma_sync_single_for_cpu,
.sync_single_for_device = ehca_dma_sync_single_for_device,
.alloc_coherent = ehca_dma_alloc_coherent,
diff --git a/drivers/infiniband/hw/qib/qib_dma.c b/drivers/infiniband/hw/qib/qib_dma.c
index 2920bb3..3620c6c 100644
--- a/drivers/infiniband/hw/qib/qib_dma.c
+++ b/drivers/infiniband/hw/qib/qib_dma.c
@@ -104,7 +104,12 @@ static int qib_map_sg(struct ib_device *dev, struct scatterlist *sgl,
for_each_sg(sgl, sg, nents, i) {
addr = (u64) page_address(sg_page(sg));
/* TODO: handle highmem pages */
- if (!addr) {
+ if (addr) {
+ sg->dma_address = addr + sg->offset;
+#ifdef CONFIG_NEED_SG_DMA_LENGTH
+ sg->dma_length = sg->length;
+#endif
+ } else {
ret = 0;
break;
}
@@ -119,21 +124,6 @@ static void qib_unmap_sg(struct ib_device *dev,
BUG_ON(!valid_dma_direction(direction));
}
-static u64 qib_sg_dma_address(struct ib_device *dev, struct scatterlist *sg)
-{
- u64 addr = (u64) page_address(sg_page(sg));
-
- if (addr)
- addr += sg->offset;
- return addr;
-}
-
-static unsigned int qib_sg_dma_len(struct ib_device *dev,
- struct scatterlist *sg)
-{
- return sg->length;
-}
-
static void qib_sync_single_for_cpu(struct ib_device *dev, u64 addr,
size_t size, enum dma_data_direction dir)
{
@@ -173,8 +163,6 @@ struct ib_dma_mapping_ops qib_dma_mapping_ops = {
.unmap_page = qib_dma_unmap_page,
.map_sg = qib_map_sg,
.unmap_sg = qib_unmap_sg,
- .dma_address = qib_sg_dma_address,
- .dma_len = qib_sg_dma_len,
.sync_single_for_cpu = qib_sync_single_for_cpu,
.sync_single_for_device = qib_sync_single_for_device,
.alloc_coherent = qib_dma_alloc_coherent,
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 46bc045..54a0689 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1049,10 +1049,6 @@ struct ib_dma_mapping_ops {
void (*unmap_sg)(struct ib_device *dev,
struct scatterlist *sg, int nents,
enum dma_data_direction direction);
- u64 (*dma_address)(struct ib_device *dev,
- struct scatterlist *sg);
- unsigned int (*dma_len)(struct ib_device *dev,
- struct scatterlist *sg);
void (*sync_single_for_cpu)(struct ib_device *dev,
u64 dma_handle,
size_t size,
@@ -1863,12 +1859,13 @@ static inline void ib_dma_unmap_sg_attrs(struct ib_device *dev,
* ib_sg_dma_address - Return the DMA address from a scatter/gather entry
* @dev: The device for which the DMA addresses were created
* @sg: The scatter/gather entry
+ *
+ * Note: this function is obsolete. To do: change all occurrences of
+ * ib_sg_dma_address() into sg_dma_address().
*/
static inline u64 ib_sg_dma_address(struct ib_device *dev,
struct scatterlist *sg)
{
- if (dev->dma_ops)
- return dev->dma_ops->dma_address(dev, sg);
return sg_dma_address(sg);
}
@@ -1876,12 +1873,13 @@ static inline u64 ib_sg_dma_address(struct ib_device *dev,
* ib_sg_dma_len - Return the DMA length from a scatter/gather entry
* @dev: The device for which the DMA addresses were created
* @sg: The scatter/gather entry
+ *
+ * Note: this function is obsolete. To do: change all occurrences of
+ * ib_sg_dma_len() into sg_dma_len().
*/
static inline unsigned int ib_sg_dma_len(struct ib_device *dev,
struct scatterlist *sg)
{
- if (dev->dma_ops)
- return dev->dma_ops->dma_len(dev, sg);
return sg_dma_len(sg);
}
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [ANNOUNCE] iptables 1.4.17 release
From: Pablo Neira Ayuso @ 2012-12-25 13:09 UTC (permalink / raw)
To: netfilter-devel; +Cc: netdev, netfilter, netfilter-announce, lwn
[-- Attachment #1: Type: text/plain, Size: 429 bytes --]
Hi!
The Netfilter project proudly presents:
iptables 1.4.17
More relevantly, this release includes the IPv6 NAT extensions from
Patrick McHardy, the ignore day transition from Florian Westphal and
a couple of fixes.
See ChangeLog that comes attached to this email for more details.
You can download it from:
http://www.netfilter.org/projects/iptables/downloads.html
ftp://ftp.netfilter.org/pub/iptables/
Have fun!
[-- Attachment #2: changes-iptables-1.4.17.txt --]
[-- Type: text/plain, Size: 744 bytes --]
Florian Westphal (1):
libxt_time: add support to ignore day transition
Jozsef Kadlecsik (1):
Manpage update: matches are evaluated in the order they are specified.
Pablo Neira Ayuso (2):
Merge branch 'next' branch that contains new features scheduled for Linux kernel 3.7
bump version to 1.4.17
Patrick McHardy (7):
Convert the NAT targets to use the kernel supplied nf_nat.h header
extensions: add IPv6 MASQUERADE extension
extensions: add IPv6 SNAT extension
extensions: add IPv6 DNAT target
extensions: add IPv6 REDIRECT extension
extensions: add IPv6 NETMAP extension
extensions: add NPT extension
Tom Eastep (1):
extensions: libxt_statistic: Fix save output
^ permalink raw reply
* [PATCH] ndisc: Ensure to reserve header space for encapsulation.
From: YOSHIFUJI Hideaki @ 2012-12-25 14:39 UTC (permalink / raw)
To: netdev, davem; +Cc: yoshfuji
We allocate sk_buff of MAX_HEADER + LL_RESERVED_SPACE(dev) + packet
length + dev->needed_tailroom, but reserved LL_RESERVED_SPACE(dev)
only. This means that space for encapsulation was placed at the end
of buffer. This does not make sense.
Reserve the space correctly, like this:
head data tail end
+--------------------------------------------------------------+
+ | | | |
+--------------------------------------------------------------+
|<--MAX_HEADER-->|<-hlen---->|<---ipv6 packet------>|<--tlen-->|
=LL_
RESERVED_
SPACE(dev)
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
---
net/ipv6/ndisc.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index 6574175..5f78ac2 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -404,7 +404,7 @@ static struct sk_buff *ndisc_build_skb(struct net_device *dev,
return NULL;
}
- skb_reserve(skb, hlen);
+ skb_reserve(skb, MAX_HEADER + hlen);
ip6_nd_hdr(sk, skb, dev, saddr, daddr, IPPROTO_ICMPV6, len);
skb->transport_header = skb->tail;
@@ -1449,7 +1449,7 @@ void ndisc_send_redirect(struct sk_buff *skb, const struct in6_addr *target)
goto release;
}
- skb_reserve(buff, hlen);
+ skb_reserve(buff, MAX_HEADER + hlen);
ip6_nd_hdr(sk, buff, dev, &saddr_buf, &ipv6_hdr(skb)->saddr,
IPPROTO_ICMPV6, len);
--
1.7.9.5
^ permalink raw reply related
* [PATCH] ipv6 mcast: Fix incorrect use of pskb_may_pull().
From: YOSHIFUJI Hideaki @ 2012-12-25 14:41 UTC (permalink / raw)
To: netdev, davem; +Cc: yoshfuji
pskb_may_pull(skb, len) ensures that data between
skb->data and skb->data + len - 1 can be accessed as a
linear buffer. When pskb_may_pull() is being used multiple
times for the same buffer without skb_pull(), the length
is not accumulated internally, and caller must do.
For example, assuming that we have done:
pskb_may_pull(skb, sizeof(struct icmp6hdr));
Afterwards, we have to do:
pskb_may_pull(skb, sizeof(struct mldv2_query))
instead of:
pskb_may_pull(skb, sizeof(struct mldv2_query) -
sizeof(struct icmp6hdr))
This incorrect use may cause bad thing if someone sends a message
with extension headers so that the message is fragmented just
after the icmpv6 header.
Of course we can try pulling step by step but there is almost
no benefit to doing so; MLD is our final protocol and we are
going to access almost the whole message as a linear buffer
anyway.
So, let's linearlize the buffer just after the trivial
sanity checks on IPv6 and ICMPv6 header in
igmp6_event_{query,report}().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
---
net/ipv6/mcast.c | 22 ++++++----------------
1 file changed, 6 insertions(+), 16 deletions(-)
diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c
index 28dfa5f..6b42b563 100644
--- a/net/ipv6/mcast.c
+++ b/net/ipv6/mcast.c
@@ -1124,9 +1124,6 @@ int igmp6_event_query(struct sk_buff *skb)
int mark = 0;
int len;
- if (!pskb_may_pull(skb, sizeof(struct in6_addr)))
- return -EINVAL;
-
/* compute payload length excluding extension headers */
len = ntohs(ipv6_hdr(skb)->payload_len) + sizeof(struct ipv6hdr);
len -= skb_network_header_len(skb);
@@ -1136,10 +1133,12 @@ int igmp6_event_query(struct sk_buff *skb)
return -EINVAL;
idev = __in6_dev_get(skb->dev);
-
if (idev == NULL)
return 0;
+ if (!pskb_may_pull(skb, skb->len))
+ return -EINVAL;
+
mld = (struct mld_msg *)icmp6_hdr(skb);
group = &mld->mld_mca;
group_type = ipv6_addr_type(group);
@@ -1165,11 +1164,6 @@ int igmp6_event_query(struct sk_buff *skb)
/* clear deleted report items */
mld_clear_delrec(idev);
} else if (len >= 28) {
- int srcs_offset = sizeof(struct mld2_query) -
- sizeof(struct icmp6hdr);
- if (!pskb_may_pull(skb, srcs_offset))
- return -EINVAL;
-
mlh2 = (struct mld2_query *)skb_transport_header(skb);
max_delay = (MLDV2_MRC(ntohs(mlh2->mld2q_mrc))*HZ)/1000;
if (!max_delay)
@@ -1186,10 +1180,6 @@ int igmp6_event_query(struct sk_buff *skb)
}
/* mark sources to include, if group & source-specific */
if (mlh2->mld2q_nsrcs != 0) {
- if (!pskb_may_pull(skb, srcs_offset +
- ntohs(mlh2->mld2q_nsrcs) * sizeof(struct in6_addr)))
- return -EINVAL;
-
mlh2 = (struct mld2_query *)skb_transport_header(skb);
mark = 1;
}
@@ -1248,9 +1238,6 @@ int igmp6_event_report(struct sk_buff *skb)
skb->pkt_type != PACKET_BROADCAST)
return 0;
- if (!pskb_may_pull(skb, sizeof(*mld) - sizeof(struct icmp6hdr)))
- return -EINVAL;
-
mld = (struct mld_msg *)icmp6_hdr(skb);
/* Drop reports with not link local source */
@@ -1263,6 +1250,9 @@ int igmp6_event_report(struct sk_buff *skb)
if (idev == NULL)
return -ENODEV;
+ if (!pskb_may_pull(skb, sizeof(*mld)))
+ return -EINVAL;
+
/*
* Cancel the timer for this group
*/
--
1.7.9.5
^ permalink raw reply related
* Re: [PATCH] ipv6 mcast: Fix incorrect use of pskb_may_pull().
From: Eric Dumazet @ 2012-12-25 17:27 UTC (permalink / raw)
To: YOSHIFUJI Hideaki; +Cc: netdev, davem
In-Reply-To: <50D9BB19.2080801@linux-ipv6.org>
On Tue, 2012-12-25 at 23:41 +0900, YOSHIFUJI Hideaki wrote:
> + if (!pskb_may_pull(skb, skb->len))
> + return -EINVAL;
> +
skb_linearize(skb) might be more explicit then.
^ permalink raw reply
* Re: Why tcp_sacktag_walk specially process next_dup?
From: Ilpo Järvinen @ 2012-12-25 20:57 UTC (permalink / raw)
To: Yi Li; +Cc: Netdev
In-Reply-To: <50D9817D.2010206@gmail.com>
On Tue, 25 Dec 2012, Yi Li wrote:
> Hi Ilpo,
> I am a kernel newbie, maybe this question is simple.
> If you have some free time, could you help me ?
>
> I am reading your commit
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=68f8353b480e5f2e136c38a511abdbb88eaa8ce2,
> through this code path:
>
> tcp_sacktag_write_queue() {
> if (tcp_sack_cache_ok(tp, cache) && !dup_sack &&
> after(end_seq, cache->start_seq)) {
>
> /* Head todo? */
> if (before(start_seq, cache->start_seq)) {
> skb = tcp_sacktag_skip(skb, sk, &state,
> start_seq);
> skb = tcp_sacktag_walk(skb, sk, next_dup,
> &state,
> start_seq,
> cache->start_seq,
> dup_sack);
> }
>
> }
>
> and when we come to tcp_sacktag_walk(), comparing the current processing sack
> block
> with cache, we have: start_seq < cache->start_seq, and we now need to
> process the
> bytes between (start_seq, cache->start_seq) in tcp_write_queue.
>
> But in tcp_sacktag_walk(), why we first check the seqence space in next_dup ?
> I know this is about D-SACK, and I have read the rfc2883, but I am still
> confused.
> I have some questions:
> 1. Why we introduce a next_dup variable in SACK processing, is it better for
> performance optimization?
IIRC, it is optimization, probably to avoid need to do non-in-order ops
(it's few years since I wrote that already :)). But I'll take a look later
again.
> As there is dup_sack variable, will this pre-processing of sack block be
> mixed with dup_sack ?
IIRC, dup_sack tells that we're processing DSACK currently.
> 2. What does this test statement means in tcp_sacktag_walk:
> if ((next_dup != NULL) &&
> before(TCP_SKB_CB(skb)->seq, next_dup->end_seq)) {
> ---------------------> A
> in_sack = tcp_match_skb_to_sack(sk, skb,
> next_dup->start_seq,
> next_dup->end_seq);
> if (in_sack > 0)
> dup_sack = true;
> }
> as far as i know, if tcp_skb_pcout(skb)>1, this condition maybe exist:
> skb->seq <
> current_sack_block.start_seq < current_sack_block.end_seq <
> next_dup->start_seq < next_dup->end_seq.
>
> So, I do not understand what the code A really does.
Then tcp_match_skb_to_sack won't match this skb but just splits first if
necessary. It only does work if there's something to do already there
(but I cannot fully check the code easily now so you might have some
point I cannot follow based on this email alone).
BTW, these two conditions will always hold nowadays due to SACK block
validation:
> current_sack_block.start_seq < current_sack_block.end_seq
> next_dup->start_seq < next_dup->end_seq.
--
i.
^ permalink raw reply
* Re: [PATCH 19/29] batman-adv: fix random jitter calculation
From: Akinobu Mita @ 2012-12-25 21:35 UTC (permalink / raw)
To: Antonio Quartulli
Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, Simon Wunderlich,
akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, Marek Lindner,
David S. Miller
In-Reply-To: <20121225112657.GC2604-E/2OGukznS5g9hUCZPvPmw@public.gmane.org>
2012/12/25 Antonio Quartulli <ordex-GaUfNO9RBHfsrOwW+9ziJQ@public.gmane.org>:
> On Mon, Dec 24, 2012 at 11:14:06AM +0900, Akinobu Mita wrote:
>> batadv_iv_ogm_emit_send_time() attempts to calculates a random integer
>> in the range of 'orig_interval +- BATADV_JITTER' by the below lines.
>>
>> msecs = atomic_read(&bat_priv->orig_interval) - BATADV_JITTER;
>> msecs += (random32() % 2 * BATADV_JITTER);
>>
>> But it actually gets 'orig_interval' or 'orig_interval - BATADV_JITTER'
>> because '%' and '*' have same precedence and associativity is
>> left-to-right.
>>
>> This adds the parentheses at the appropriate position so that it matches
>> original intension.
>>
>> Signed-off-by: Akinobu Mita <akinobu.mita-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>> Cc: Marek Lindner <lindner_marek-LWAfsSFWpa4@public.gmane.org>
>> Cc: Simon Wunderlich <siwu-MaAgPAbsBIVS8oHt8HbXEIQuADTiUCJX@public.gmane.org>
>> Cc: Antonio Quartulli <ordex-GaUfNO9RBHfsrOwW+9ziJQ@public.gmane.org>
>> Cc: b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r@public.gmane.org
>> Cc: "David S. Miller" <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
>> Cc: netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> ---
>
> Acked-by: Antonio Quartulli <ordex-GaUfNO9RBHfsrOwW+9ziJQ@public.gmane.org>
>
> But I would suggest to apply this change to net, since it is a fix.
I agree.
David, please consider to apply this patch for 3.8-rc*.
^ permalink raw reply
* Re: [PATCH 19/29] batman-adv: fix random jitter calculation
From: David Miller @ 2012-12-25 21:37 UTC (permalink / raw)
To: akinobu.mita-Re5JQEeQqe8AvxtiuMwx3w
Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
siwu-MaAgPAbsBIVS8oHt8HbXEIQuADTiUCJX,
akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, lindner_marek-LWAfsSFWpa4
In-Reply-To: <CAC5umyh1d9XxH2We4e2rjV-OYhbP2ObGC_iwrDeff2hphJ7TzQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
From: Akinobu Mita <akinobu.mita-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date: Wed, 26 Dec 2012 06:35:37 +0900
> 2012/12/25 Antonio Quartulli <ordex-GaUfNO9RBHfsrOwW+9ziJQ@public.gmane.org>:
>> On Mon, Dec 24, 2012 at 11:14:06AM +0900, Akinobu Mita wrote:
>>> batadv_iv_ogm_emit_send_time() attempts to calculates a random integer
>>> in the range of 'orig_interval +- BATADV_JITTER' by the below lines.
>>>
>>> msecs = atomic_read(&bat_priv->orig_interval) - BATADV_JITTER;
>>> msecs += (random32() % 2 * BATADV_JITTER);
>>>
>>> But it actually gets 'orig_interval' or 'orig_interval - BATADV_JITTER'
>>> because '%' and '*' have same precedence and associativity is
>>> left-to-right.
>>>
>>> This adds the parentheses at the appropriate position so that it matches
>>> original intension.
>>>
>>> Signed-off-by: Akinobu Mita <akinobu.mita-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>>> Cc: Marek Lindner <lindner_marek-LWAfsSFWpa4@public.gmane.org>
>>> Cc: Simon Wunderlich <siwu-MaAgPAbsBIVS8oHt8HbXEIQuADTiUCJX@public.gmane.org>
>>> Cc: Antonio Quartulli <ordex-GaUfNO9RBHfsrOwW+9ziJQ@public.gmane.org>
>>> Cc: b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r@public.gmane.org
>>> Cc: "David S. Miller" <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
>>> Cc: netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>> ---
>>
>> Acked-by: Antonio Quartulli <ordex-GaUfNO9RBHfsrOwW+9ziJQ@public.gmane.org>
>>
>> But I would suggest to apply this change to net, since it is a fix.
>
> I agree.
> David, please consider to apply this patch for 3.8-rc*.
>
All patches I should consider seriously should be properly
reposted for review and inclusion.
^ permalink raw reply
* Re: [PATCH 28/29] net/: rename net_random() to prandom_u32()
From: Neil Horman @ 2012-12-26 0:42 UTC (permalink / raw)
To: Akinobu Mita
Cc: linux-kernel, akpm, Jesse Gross, Venkat Venkatsubra,
Vlad Yasevich, Sridhar Samudrala, Steffen Klassert, Herbert Xu,
David S. Miller, linux-sctp, dev, netdev
In-Reply-To: <CAC5umyhhTJWfsFtZ_5Qm_HJHy2+sx7iMgcV=T=MMKLdi50-HQw@mail.gmail.com>
On Tue, Dec 25, 2012 at 08:47:26PM +0900, Akinobu Mita wrote:
> 2012/12/25 Neil Horman <nhorman@tuxdriver.com>:
> > On Mon, Dec 24, 2012 at 11:14:15AM +0900, Akinobu Mita wrote:
> >> Use more preferable function name which implies using a pseudo-random
> >> number generator.
> >>
> >> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
> >> Cc: Jesse Gross <jesse@nicira.com>
> >> Cc: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
> >> Cc: Vlad Yasevich <vyasevich@gmail.com>
> >> Cc: Sridhar Samudrala <sri@us.ibm.com>
> >> Cc: Neil Horman <nhorman@tuxdriver.com>
> >> Cc: Steffen Klassert <steffen.klassert@secunet.com>
> >> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> >> Cc: "David S. Miller" <davem@davemloft.net>
> >> Cc: linux-sctp@vger.kernel.org
> >> Cc: dev@openvswitch.org
> >> Cc: netdev@vger.kernel.org
> >> ---
> >> include/net/red.h | 2 +-
> >> net/802/garp.c | 2 +-
> >> net/openvswitch/actions.c | 2 +-
> >> net/rds/bind.c | 2 +-
> >> net/sctp/socket.c | 2 +-
> >> net/xfrm/xfrm_state.c | 2 +-
> >> 6 files changed, 6 insertions(+), 6 deletions(-)
> >>
> > I'm largely indifferent to this patch, but I kind of feel like its just churn.
> > Whats the real advantage in making this change? I grant that it clearly
> > indicates the type of random number generator we're using at a given call site,
> > But for those using net_random, you probably don't care too much about
> > the source of your random bits. If you did really want true random vs.
> > pseudo-random data, you need to explicitly use the right call. You're previous
> > patch series did good cleanup on differentiating the different random calls, but
> > this just seems like its removing what is otherwise useful indirection.
>
> I overlooked the importance of net_random() indirection.
> Thanks for the feedback. I'll leave all net_random() callers as-is in
> the next version.
Well, I guess I should qualify my opinion. I find it useful personally (the
generation of nonces in many cases can be left to most any pseudo random
generator that the system deems is a 'good enough' balance between a fast
generator that doesn't block on low entropy and a reasonably secure one that
doesn't allow for easy prediction. As those needs and factors change, its nice
to have a set point to change them at. If you (or anyone else has a differing
opinion, I'm happy to listen to it.
Regards
Neil
>
^ permalink raw reply
* Re: [PATCH 4/4 v2] net/smsc911x: Provide common clock functionality
From: Linus Walleij @ 2012-12-26 0:51 UTC (permalink / raw)
To: Lee Jones
Cc: Russell King - ARM Linux, Steve Glendinning, Robert Marklund,
linus.walleij, arnd, netdev, linux-kernel, linux-arm-kernel
In-Reply-To: <20121221114105.GP2691@gmail.com>
On Fri, Dec 21, 2012 at 12:41 PM, Lee Jones <lee.jones@linaro.org> wrote:
> + if (IS_ERR(pdata->clk)) {
> + ret = clk_prepare_enable(pdata->clk);
> + if (ret < 0)
> + netdev_err(ndev, "failed to enable clock %d\n", ret);
> + }
I think you got all of these backwards now, shouldn't it be if
(!IS_ERR(pdata->clk)) { } ...?
It's late here but enlighten me if I don't get it.
> + if (IS_ERR(pdata->clk))
> + clk_disable_unprepare(pdata->clk);
Dito.
> + /* Request clock */
> + pdata->clk = clk_get(&pdev->dev, NULL);
> + if (IS_ERR(pdata->clk))
> + netdev_warn(ndev, "couldn't get clock %li\n", PTR_ERR(pdata->clk));
This one seems correct though.
> + /* Free clock */
> + if (IS_ERR(pdata->clk)) {
> + clk_put(pdata->clk);
> + pdata->clk = NULL;
> + }
Should be !IS_ERR()
Yours,
Linus Walleij
^ permalink raw reply
* Re: [PATCH] ipv6 mcast: Fix incorrect use of pskb_may_pull().
From: YOSHIFUJI Hideaki @ 2012-12-26 3:11 UTC (permalink / raw)
To: Eric Dumazet, 'netdev@vger.kernel.org'; +Cc: davem, YOSHIFUJI Hideaki
In-Reply-To: <1356456420.20133.15020.camel@edumazet-glaptop>
Eric Dumazet wrote:
> On Tue, 2012-12-25 at 23:41 +0900, YOSHIFUJI Hideaki wrote:
>
>> + if (!pskb_may_pull(skb, skb->len))
>> + return -EINVAL;
>> +
>
> skb_linearize(skb) might be more explicit then.
Oh, I've found errors, so I'll respin.
--yoshfuji
^ permalink raw reply
* [PATCH V2] ipv6 mcast: Fix incorrect use of pskb_may_pull().
From: YOSHIFUJI Hideaki @ 2012-12-26 3:12 UTC (permalink / raw)
To: 'netdev@vger.kernel.org', David Miller
Cc: YOSHIFUJI Hideaki, erdnetdev
pskb_may_pull(skb, len) ensures that len bytes from skb->data
are available in a linear array. When pskb_may_pull() is
being used multiple times for the same buffer without
skb_pull(), the length is not accumulated.
For example, assuming that we have done:
pskb_may_pull(skb, sizeof(struct icmp6hdr))
Here, we have to do:
pskb_may_pull(skb, sizeof(struct mld2_query))
instead of:
pskb_may_pull(skb, sizeof(struct mld2_query) -
sizeof(struct icmp6hdr))
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
---
net/ipv6/mcast.c | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c
index 28dfa5f..5d91832 100644
--- a/net/ipv6/mcast.c
+++ b/net/ipv6/mcast.c
@@ -1124,7 +1124,7 @@ int igmp6_event_query(struct sk_buff *skb)
int mark = 0;
int len;
- if (!pskb_may_pull(skb, sizeof(struct in6_addr)))
+ if (!pskb_may_pull(skb, sizeof(struct icmp6hdr) + sizeof(struct in6_addr)))
return -EINVAL;
/* compute payload length excluding extension headers */
@@ -1165,9 +1165,7 @@ int igmp6_event_query(struct sk_buff *skb)
/* clear deleted report items */
mld_clear_delrec(idev);
} else if (len >= 28) {
- int srcs_offset = sizeof(struct mld2_query) -
- sizeof(struct icmp6hdr);
- if (!pskb_may_pull(skb, srcs_offset))
+ if (!pskb_may_pull(skb, sizeof(struct mld2_query)))
return -EINVAL;
mlh2 = (struct mld2_query *)skb_transport_header(skb);
@@ -1186,8 +1184,9 @@ int igmp6_event_query(struct sk_buff *skb)
}
/* mark sources to include, if group & source-specific */
if (mlh2->mld2q_nsrcs != 0) {
- if (!pskb_may_pull(skb, srcs_offset +
- ntohs(mlh2->mld2q_nsrcs) * sizeof(struct in6_addr)))
+ if (!pskb_may_pull(skb,
+ sizeof(struct mld2_query) +
+ ntohs(mlh2->mld2q_nsrcs) * sizeof(struct in6_addr)))
return -EINVAL;
mlh2 = (struct mld2_query *)skb_transport_header(skb);
@@ -1248,7 +1247,7 @@ int igmp6_event_report(struct sk_buff *skb)
skb->pkt_type != PACKET_BROADCAST)
return 0;
- if (!pskb_may_pull(skb, sizeof(*mld) - sizeof(struct icmp6hdr)))
+ if (!pskb_may_pull(skb, sizeof(*mld)))
return -EINVAL;
mld = (struct mld_msg *)icmp6_hdr(skb);
--
1.7.9.5
^ permalink raw reply related
* Re: [PATCH v2 0/2] cpts fixes for v3.8-rc2
From: Mugunthan V N @ 2012-12-26 4:12 UTC (permalink / raw)
To: Richard Cochran
Cc: netdev, linux-arm-kernel, linux-omap, David Miller,
Cyril Chemparathy, Sergei Shtylyov
In-Reply-To: <cover.1356331925.git.richardcochran@gmail.com>
On 12/24/2012 12:49 PM, Richard Cochran wrote:
> Changed in v2:
> Use clk_prepare_enable instead of clk_prepare + clk_enable.
>
> The new cpts driver has two small issues, but it otherwise seems to be
> working in -rc1.
>
> Thanks,
> Richard
>
> Richard Cochran (2):
> cpts: fix build error by removing useless code.
> cpts: fix a run time warn_on.
>
> drivers/net/ethernet/ti/cpts.c | 3 +--
> drivers/net/ethernet/ti/cpts.h | 1 -
> 2 files changed, 1 insertions(+), 3 deletions(-)
>
Looks good to me.
Acked-by : Mugunthan V N <mugunthanvnm@ti.com>
^ permalink raw reply
* [PATCH] netprio_cgroup: define sk_cgrp_prioidx only if NETPRIO_CGROUP is enabled
From: Li Zefan @ 2012-12-26 6:48 UTC (permalink / raw)
To: David Miller; +Cc: Neil Horman, LKML, netdev-u79uwXL29TY76Z2rM5mHXA, Cgroups
sock->sk_cgrp_prioidx won't be used at all if CONFIG_NETPRIO_CGROUP=n.
Signed-off-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
include/net/sock.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/net/sock.h b/include/net/sock.h
index 93a6745..182ca99 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -367,7 +367,7 @@ struct sock {
unsigned short sk_ack_backlog;
unsigned short sk_max_ack_backlog;
__u32 sk_priority;
-#ifdef CONFIG_CGROUPS
+#if IS_ENABLED(CONFIG_NETPRIO_CGROUP)
__u32 sk_cgrp_prioidx;
#endif
struct pid *sk_peer_pid;
--
1.8.0.2
^ permalink raw reply related
* [RFC PATCH] virtio-net: reset virtqueue affinity when doing cpu hotplug
From: Wanlong Gao @ 2012-12-26 7:06 UTC (permalink / raw)
To: linux-kernel; +Cc: Michael S. Tsirkin, netdev, virtualization
Add a cpu notifier to virtio-net, so that we can reset the
virtqueue affinity if the cpu hotplug happens. It improve
the performance through enabling or disabling the virtqueue
affinity after doing cpu hotplug.
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: virtualization@lists.linux-foundation.org
Cc: netdev@vger.kernel.org
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
---
drivers/net/virtio_net.c | 39 ++++++++++++++++++++++++++++++++++++++-
1 file changed, 38 insertions(+), 1 deletion(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index a6fcf15..9710cf4 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -26,6 +26,7 @@
#include <linux/scatterlist.h>
#include <linux/if_vlan.h>
#include <linux/slab.h>
+#include <linux/cpu.h>
static int napi_weight = 128;
module_param(napi_weight, int, 0444);
@@ -34,6 +35,8 @@ static bool csum = true, gso = true;
module_param(csum, bool, 0444);
module_param(gso, bool, 0444);
+static bool cpu_hotplug = false;
+
/* FIXME: MTU in config. */
#define MAX_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
#define GOOD_COPY_LEN 128
@@ -1041,6 +1044,26 @@ static void virtnet_set_affinity(struct virtnet_info *vi, bool set)
vi->affinity_hint_set = false;
}
+static int virtnet_cpu_callback(struct notifier_block *nfb,
+ unsigned long action, void *hcpu)
+{
+ switch(action) {
+ case CPU_ONLINE:
+ case CPU_ONLINE_FROZEN:
+ case CPU_DEAD:
+ case CPU_DEAD_FROZEN:
+ cpu_hotplug = true;
+ break;
+ default:
+ break;
+ }
+ return NOTIFY_OK;
+}
+
+static struct notifier_block virtnet_cpu_notifier = {
+ .notifier_call = virtnet_cpu_callback,
+};
+
static void virtnet_get_ringparam(struct net_device *dev,
struct ethtool_ringparam *ring)
{
@@ -1131,7 +1154,14 @@ static int virtnet_change_mtu(struct net_device *dev, int new_mtu)
*/
static u16 virtnet_select_queue(struct net_device *dev, struct sk_buff *skb)
{
- int txq = skb_rx_queue_recorded(skb) ? skb_get_rx_queue(skb) :
+ int txq;
+
+ if (unlikely(cpu_hotplug == true)) {
+ virtnet_set_affinity(netdev_priv(dev), true);
+ cpu_hotplug = false;
+ }
+
+ txq = skb_rx_queue_recorded(skb) ? skb_get_rx_queue(skb) :
smp_processor_id();
while (unlikely(txq >= dev->real_num_tx_queues))
@@ -1248,6 +1278,8 @@ static void virtnet_del_vqs(struct virtnet_info *vi)
{
struct virtio_device *vdev = vi->vdev;
+ unregister_hotcpu_notifier(&virtnet_cpu_notifier);
+
virtnet_set_affinity(vi, false);
vdev->config->del_vqs(vdev);
@@ -1372,6 +1404,11 @@ static int init_vqs(struct virtnet_info *vi)
goto err_free;
virtnet_set_affinity(vi, true);
+
+ ret = register_hotcpu_notifier(&virtnet_cpu_notifier);
+ if (ret)
+ goto err_free;
+
return 0;
err_free:
--
1.8.0
^ permalink raw reply related
* Re: [RFC PATCH] virtio-net: reset virtqueue affinity when doing cpu hotplug
From: Jason Wang @ 2012-12-26 10:06 UTC (permalink / raw)
To: Wanlong Gao; +Cc: netdev, virtualization, linux-kernel, Michael S. Tsirkin
In-Reply-To: <1356505614-16683-1-git-send-email-gaowanlong@cn.fujitsu.com>
On 12/26/2012 03:06 PM, Wanlong Gao wrote:
> Add a cpu notifier to virtio-net, so that we can reset the
> virtqueue affinity if the cpu hotplug happens. It improve
> the performance through enabling or disabling the virtqueue
> affinity after doing cpu hotplug.
Hi Wanlong:
Thanks for looking at this.
> Cc: Rusty Russell <rusty@rustcorp.com.au>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: virtualization@lists.linux-foundation.org
> Cc: netdev@vger.kernel.org
> Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
> ---
> drivers/net/virtio_net.c | 39 ++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 38 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index a6fcf15..9710cf4 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -26,6 +26,7 @@
> #include <linux/scatterlist.h>
> #include <linux/if_vlan.h>
> #include <linux/slab.h>
> +#include <linux/cpu.h>
>
> static int napi_weight = 128;
> module_param(napi_weight, int, 0444);
> @@ -34,6 +35,8 @@ static bool csum = true, gso = true;
> module_param(csum, bool, 0444);
> module_param(gso, bool, 0444);
>
> +static bool cpu_hotplug = false;
> +
> /* FIXME: MTU in config. */
> #define MAX_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
> #define GOOD_COPY_LEN 128
> @@ -1041,6 +1044,26 @@ static void virtnet_set_affinity(struct virtnet_info *vi, bool set)
> vi->affinity_hint_set = false;
> }
>
> +static int virtnet_cpu_callback(struct notifier_block *nfb,
> + unsigned long action, void *hcpu)
> +{
> + switch(action) {
> + case CPU_ONLINE:
> + case CPU_ONLINE_FROZEN:
> + case CPU_DEAD:
> + case CPU_DEAD_FROZEN:
> + cpu_hotplug = true;
> + break;
> + default:
> + break;
> + }
> + return NOTIFY_OK;
> +}
> +
> +static struct notifier_block virtnet_cpu_notifier = {
> + .notifier_call = virtnet_cpu_callback,
> +};
> +
> static void virtnet_get_ringparam(struct net_device *dev,
> struct ethtool_ringparam *ring)
> {
> @@ -1131,7 +1154,14 @@ static int virtnet_change_mtu(struct net_device *dev, int new_mtu)
> */
> static u16 virtnet_select_queue(struct net_device *dev, struct sk_buff *skb)
> {
> - int txq = skb_rx_queue_recorded(skb) ? skb_get_rx_queue(skb) :
> + int txq;
> +
> + if (unlikely(cpu_hotplug == true)) {
> + virtnet_set_affinity(netdev_priv(dev), true);
> + cpu_hotplug = false;
> + }
> +
Why don't you just do this in callback?
btw. Does qemu/kvm support cpu-hotplug now?
> + txq = skb_rx_queue_recorded(skb) ? skb_get_rx_queue(skb) :
> smp_processor_id();
>
> while (unlikely(txq >= dev->real_num_tx_queues))
> @@ -1248,6 +1278,8 @@ static void virtnet_del_vqs(struct virtnet_info *vi)
> {
> struct virtio_device *vdev = vi->vdev;
>
> + unregister_hotcpu_notifier(&virtnet_cpu_notifier);
> +
> virtnet_set_affinity(vi, false);
>
> vdev->config->del_vqs(vdev);
> @@ -1372,6 +1404,11 @@ static int init_vqs(struct virtnet_info *vi)
> goto err_free;
>
> virtnet_set_affinity(vi, true);
> +
> + ret = register_hotcpu_notifier(&virtnet_cpu_notifier);
> + if (ret)
> + goto err_free;
> +
> return 0;
>
> err_free:
^ permalink raw reply
* Re: [RFC PATCH] virtio-net: reset virtqueue affinity when doing cpu hotplug
From: Wanlong Gao @ 2012-12-26 10:19 UTC (permalink / raw)
To: Jason Wang; +Cc: netdev, virtualization, linux-kernel, Michael S. Tsirkin
In-Reply-To: <50DACC26.7050409@redhat.com>
On 12/26/2012 06:06 PM, Jason Wang wrote:
> On 12/26/2012 03:06 PM, Wanlong Gao wrote:
>> Add a cpu notifier to virtio-net, so that we can reset the
>> virtqueue affinity if the cpu hotplug happens. It improve
>> the performance through enabling or disabling the virtqueue
>> affinity after doing cpu hotplug.
>
> Hi Wanlong:
>
> Thanks for looking at this.
>> Cc: Rusty Russell <rusty@rustcorp.com.au>
>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
>> Cc: Jason Wang <jasowang@redhat.com>
>> Cc: virtualization@lists.linux-foundation.org
>> Cc: netdev@vger.kernel.org
>> Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
>> ---
>> drivers/net/virtio_net.c | 39 ++++++++++++++++++++++++++++++++++++++-
>> 1 file changed, 38 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index a6fcf15..9710cf4 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -26,6 +26,7 @@
>> #include <linux/scatterlist.h>
>> #include <linux/if_vlan.h>
>> #include <linux/slab.h>
>> +#include <linux/cpu.h>
>>
>> static int napi_weight = 128;
>> module_param(napi_weight, int, 0444);
>> @@ -34,6 +35,8 @@ static bool csum = true, gso = true;
>> module_param(csum, bool, 0444);
>> module_param(gso, bool, 0444);
>>
>> +static bool cpu_hotplug = false;
>> +
>> /* FIXME: MTU in config. */
>> #define MAX_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
>> #define GOOD_COPY_LEN 128
>> @@ -1041,6 +1044,26 @@ static void virtnet_set_affinity(struct virtnet_info *vi, bool set)
>> vi->affinity_hint_set = false;
>> }
>>
>> +static int virtnet_cpu_callback(struct notifier_block *nfb,
>> + unsigned long action, void *hcpu)
>> +{
>> + switch(action) {
>> + case CPU_ONLINE:
>> + case CPU_ONLINE_FROZEN:
>> + case CPU_DEAD:
>> + case CPU_DEAD_FROZEN:
>> + cpu_hotplug = true;
>> + break;
>> + default:
>> + break;
>> + }
>> + return NOTIFY_OK;
>> +}
>> +
>> +static struct notifier_block virtnet_cpu_notifier = {
>> + .notifier_call = virtnet_cpu_callback,
>> +};
>> +
>> static void virtnet_get_ringparam(struct net_device *dev,
>> struct ethtool_ringparam *ring)
>> {
>> @@ -1131,7 +1154,14 @@ static int virtnet_change_mtu(struct net_device *dev, int new_mtu)
>> */
>> static u16 virtnet_select_queue(struct net_device *dev, struct sk_buff *skb)
>> {
>> - int txq = skb_rx_queue_recorded(skb) ? skb_get_rx_queue(skb) :
>> + int txq;
>> +
>> + if (unlikely(cpu_hotplug == true)) {
>> + virtnet_set_affinity(netdev_priv(dev), true);
>> + cpu_hotplug = false;
>> + }
>> +
>
> Why don't you just do this in callback?
Callback can just give us a "hcpu", can't get the virtnet_info from callback. Am I missing something?
>
> btw. Does qemu/kvm support cpu-hotplug now?
>From http://www.linux-kvm.org/page/CPUHotPlug, I saw that qemu-kvm can support hotplug
but failed to merge to qemu.git, right?
Thanks,
Wanlong Gao
^ permalink raw reply
* Re: [RFC PATCH] virtio-net: reset virtqueue affinity when doing cpu hotplug
From: Michael S. Tsirkin @ 2012-12-26 10:46 UTC (permalink / raw)
To: Wanlong Gao; +Cc: netdev, linux-kernel, virtualization
In-Reply-To: <1356505614-16683-1-git-send-email-gaowanlong@cn.fujitsu.com>
On Wed, Dec 26, 2012 at 03:06:54PM +0800, Wanlong Gao wrote:
> Add a cpu notifier to virtio-net, so that we can reset the
> virtqueue affinity if the cpu hotplug happens. It improve
> the performance through enabling or disabling the virtqueue
> affinity after doing cpu hotplug.
>
> Cc: Rusty Russell <rusty@rustcorp.com.au>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: virtualization@lists.linux-foundation.org
> Cc: netdev@vger.kernel.org
> Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Thanks for looking into this.
Some comments:
1. Looks like the logic in
virtnet_set_affinity (and in virtnet_select_queue)
will not work very well when CPU IDs are not
consequitive. This can happen with hot unplug.
Maybe we should add a VQ allocator, and defining
a per-cpu variable specifying the VQ instead
of using CPU ID.
2. The below code seems racy e.g. when CPU is added
during device init.
3. using a global cpu_hotplug seems inelegant.
In any case we should document what is the
meaning of this variable.
> ---
> drivers/net/virtio_net.c | 39 ++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 38 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index a6fcf15..9710cf4 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -26,6 +26,7 @@
> #include <linux/scatterlist.h>
> #include <linux/if_vlan.h>
> #include <linux/slab.h>
> +#include <linux/cpu.h>
>
> static int napi_weight = 128;
> module_param(napi_weight, int, 0444);
> @@ -34,6 +35,8 @@ static bool csum = true, gso = true;
> module_param(csum, bool, 0444);
> module_param(gso, bool, 0444);
>
> +static bool cpu_hotplug = false;
> +
> /* FIXME: MTU in config. */
> #define MAX_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
> #define GOOD_COPY_LEN 128
> @@ -1041,6 +1044,26 @@ static void virtnet_set_affinity(struct virtnet_info *vi, bool set)
> vi->affinity_hint_set = false;
> }
>
> +static int virtnet_cpu_callback(struct notifier_block *nfb,
> + unsigned long action, void *hcpu)
> +{
> + switch(action) {
> + case CPU_ONLINE:
> + case CPU_ONLINE_FROZEN:
> + case CPU_DEAD:
> + case CPU_DEAD_FROZEN:
> + cpu_hotplug = true;
> + break;
> + default:
> + break;
> + }
> + return NOTIFY_OK;
> +}
> +
> +static struct notifier_block virtnet_cpu_notifier = {
> + .notifier_call = virtnet_cpu_callback,
> +};
> +
> static void virtnet_get_ringparam(struct net_device *dev,
> struct ethtool_ringparam *ring)
> {
> @@ -1131,7 +1154,14 @@ static int virtnet_change_mtu(struct net_device *dev, int new_mtu)
> */
> static u16 virtnet_select_queue(struct net_device *dev, struct sk_buff *skb)
> {
> - int txq = skb_rx_queue_recorded(skb) ? skb_get_rx_queue(skb) :
> + int txq;
> +
> + if (unlikely(cpu_hotplug == true)) {
> + virtnet_set_affinity(netdev_priv(dev), true);
> + cpu_hotplug = false;
> + }
> +
> + txq = skb_rx_queue_recorded(skb) ? skb_get_rx_queue(skb) :
> smp_processor_id();
>
> while (unlikely(txq >= dev->real_num_tx_queues))
> @@ -1248,6 +1278,8 @@ static void virtnet_del_vqs(struct virtnet_info *vi)
> {
> struct virtio_device *vdev = vi->vdev;
>
> + unregister_hotcpu_notifier(&virtnet_cpu_notifier);
> +
> virtnet_set_affinity(vi, false);
>
> vdev->config->del_vqs(vdev);
> @@ -1372,6 +1404,11 @@ static int init_vqs(struct virtnet_info *vi)
> goto err_free;
>
> virtnet_set_affinity(vi, true);
> +
> + ret = register_hotcpu_notifier(&virtnet_cpu_notifier);
> + if (ret)
> + goto err_free;
> +
> return 0;
>
> err_free:
> --
> 1.8.0
^ permalink raw reply
* [PATCH] batman-adv: fix random jitter calculation
From: Akinobu Mita @ 2012-12-26 12:32 UTC (permalink / raw)
To: linux-kernel, netdev
Cc: Akinobu Mita, Marek Lindner, Simon Wunderlich, Antonio Quartulli,
b.a.t.m.a.n, David S. Miller
batadv_iv_ogm_emit_send_time() attempts to calculates a random integer
in the range of 'orig_interval +- BATADV_JITTER' by the below lines.
msecs = atomic_read(&bat_priv->orig_interval) - BATADV_JITTER;
msecs += (random32() % 2 * BATADV_JITTER);
But it actually gets 'orig_interval' or 'orig_interval - BATADV_JITTER'
because '%' and '*' have same precedence and associativity is
left-to-right.
This adds the parentheses at the appropriate position so that it matches
original intension.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Antonio Quartulli <ordex@autistici.org>
Cc: Marek Lindner <lindner_marek@yahoo.de>
Cc: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Cc: Antonio Quartulli <ordex@autistici.org>
Cc: b.a.t.m.a.n@lists.open-mesh.org
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
---
This patch is extracted from the patch series "rename random32 and
net_random to prandom" since it is a fix.
net/batman-adv/bat_iv_ogm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/batman-adv/bat_iv_ogm.c b/net/batman-adv/bat_iv_ogm.c
index 9f3925a..7d02ebd 100644
--- a/net/batman-adv/bat_iv_ogm.c
+++ b/net/batman-adv/bat_iv_ogm.c
@@ -123,7 +123,7 @@ batadv_iv_ogm_emit_send_time(const struct batadv_priv *bat_priv)
unsigned int msecs;
msecs = atomic_read(&bat_priv->orig_interval) - BATADV_JITTER;
- msecs += (random32() % 2 * BATADV_JITTER);
+ msecs += random32() % (2 * BATADV_JITTER);
return jiffies + msecs_to_jiffies(msecs);
}
--
1.7.11.7
^ permalink raw reply related
* Re: [RFC PATCH] virtio-net: reset virtqueue affinity when doing cpu hotplug
From: Eric Dumazet @ 2012-12-26 15:51 UTC (permalink / raw)
To: Wanlong Gao
Cc: linux-kernel, Rusty Russell, Michael S. Tsirkin, Jason Wang,
virtualization, netdev
In-Reply-To: <1356505614-16683-1-git-send-email-gaowanlong@cn.fujitsu.com>
On Wed, 2012-12-26 at 15:06 +0800, Wanlong Gao wrote:
> Add a cpu notifier to virtio-net, so that we can reset the
> virtqueue affinity if the cpu hotplug happens. It improve
> the performance through enabling or disabling the virtqueue
> affinity after doing cpu hotplug.
>
> Cc: Rusty Russell <rusty@rustcorp.com.au>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: virtualization@lists.linux-foundation.org
> Cc: netdev@vger.kernel.org
> Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
> ---
> drivers/net/virtio_net.c | 39 ++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 38 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index a6fcf15..9710cf4 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -26,6 +26,7 @@
> #include <linux/scatterlist.h>
> #include <linux/if_vlan.h>
> #include <linux/slab.h>
> +#include <linux/cpu.h>
>
> static int napi_weight = 128;
> module_param(napi_weight, int, 0444);
> @@ -34,6 +35,8 @@ static bool csum = true, gso = true;
> module_param(csum, bool, 0444);
> module_param(gso, bool, 0444);
>
> +static bool cpu_hotplug = false;
> +
> /* FIXME: MTU in config. */
> #define MAX_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
> #define GOOD_COPY_LEN 128
> @@ -1041,6 +1044,26 @@ static void virtnet_set_affinity(struct virtnet_info *vi, bool set)
> vi->affinity_hint_set = false;
> }
>
> +static int virtnet_cpu_callback(struct notifier_block *nfb,
> + unsigned long action, void *hcpu)
> +{
> + switch(action) {
> + case CPU_ONLINE:
> + case CPU_ONLINE_FROZEN:
> + case CPU_DEAD:
> + case CPU_DEAD_FROZEN:
> + cpu_hotplug = true;
> + break;
> + default:
> + break;
> + }
> + return NOTIFY_OK;
> +}
> +
> +static struct notifier_block virtnet_cpu_notifier = {
> + .notifier_call = virtnet_cpu_callback,
> +};
> +
> static void virtnet_get_ringparam(struct net_device *dev,
> struct ethtool_ringparam *ring)
> {
> @@ -1131,7 +1154,14 @@ static int virtnet_change_mtu(struct net_device *dev, int new_mtu)
> */
> static u16 virtnet_select_queue(struct net_device *dev, struct sk_buff *skb)
> {
> - int txq = skb_rx_queue_recorded(skb) ? skb_get_rx_queue(skb) :
> + int txq;
> +
> + if (unlikely(cpu_hotplug == true)) {
> + virtnet_set_affinity(netdev_priv(dev), true);
> + cpu_hotplug = false;
> + }
> +
> + txq = skb_rx_queue_recorded(skb) ? skb_get_rx_queue(skb) :
> smp_processor_id();
>
> while (unlikely(txq >= dev->real_num_tx_queues))
> @@ -1248,6 +1278,8 @@ static void virtnet_del_vqs(struct virtnet_info *vi)
> {
> struct virtio_device *vdev = vi->vdev;
>
> + unregister_hotcpu_notifier(&virtnet_cpu_notifier);
> +
> virtnet_set_affinity(vi, false);
>
> vdev->config->del_vqs(vdev);
> @@ -1372,6 +1404,11 @@ static int init_vqs(struct virtnet_info *vi)
> goto err_free;
>
> virtnet_set_affinity(vi, true);
> +
> + ret = register_hotcpu_notifier(&virtnet_cpu_notifier);
> + if (ret)
> + goto err_free;
> +
> return 0;
>
> err_free:
It looks like this patch assumes virtio_net supports a single instance.
Try your patch with two instances, I am pretty sure it wont do very
well.
It seems to me you need something else than a single boolean.
A sequence number for example should be better...
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox