Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 08/18] virtio_ring: support for used_event idx feature
From: Rusty Russell @ 2011-05-09  4:17 UTC (permalink / raw)
  To: Michael S. Tsirkin, linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Krishna Kumar, Carsten Otte, lguest-uLR06cmDAlY/bJ5BZ2RsiQ,
	Shirley Ma, kvm-u79uwXL29TY76Z2rM5mHXA,
	linux-s390-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	habanero-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8, Heiko Carstens,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	steved-r/Jw6+rmf7HQT0dZR+AlfA, Christian Borntraeger,
	Tom Lendacky, Martin Schwidefsky, linux390-tA70FqPdS9bQT0dZR+AlfA
In-Reply-To: <dd16b9f3bad847b0eadc7f94368e27982c1a7560.1304541919.git.mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

On Wed, 4 May 2011 23:51:38 +0300, "Michael S. Tsirkin" <mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> Add support for the used_event idx feature: when enabling
> interrupts, publish the current avail index value to
> the host so that we get interrupts on the next update.
> 
> Signed-off-by: Michael S. Tsirkin <mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> ---
>  drivers/virtio/virtio_ring.c |   14 ++++++++++++++
>  1 files changed, 14 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 507d6eb..3a3ed75 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -320,6 +320,14 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len)
>  	ret = vq->data[i];
>  	detach_buf(vq, i);
>  	vq->last_used_idx++;
> +	/* If we expect an interrupt for the next entry, tell host
> +	 * by writing event index and flush out the write before
> +	 * the read in the next get_buf call. */
> +	if (!(vq->vring.avail->flags & VRING_AVAIL_F_NO_INTERRUPT)) {
> +		vring_used_event(&vq->vring) = vq->last_used_idx;
> +		virtio_mb();
> +	}
> +

Hmm, so you're still using the avail->flags; it's just if thresholding
is enabled the host will ignore it?

It's a little subtle, but it keeps this patch small.  Perhaps we'll want
to make it more explicit later.

Thanks,
Rusty.

^ permalink raw reply

* Re: linux-next: build warning after merge of the net tree
From: David Miller @ 2011-05-09  4:13 UTC (permalink / raw)
  To: sfr; +Cc: netdev, linux-next, linux-kernel
In-Reply-To: <20110509134613.bcf0b017.sfr@canb.auug.org.au>

From: Stephen Rothwell <sfr@canb.auug.org.au>
Date: Mon, 9 May 2011 13:46:13 +1000

> Hi all,
> 
> After merging the net tree, today's linux-next build (x86_64 allmodconfig)
> produced this warning:
> 
> net/sctp/protocol.c: In function 'sctp_v4_xmit':
> net/sctp/protocol.c:842: warning: format '%p' expects type 'void *', but argument 5 has type '__be32'
> net/sctp/protocol.c:842: warning: format '%p' expects type 'void *', but argument 6 has type '__be32'
> 
> Caused by commit f1c0a276ea17 ("sctp: Don't use rt->rt_{src,dst} in
> sctp_v4_xmit()").

I'll fix this, thanks for the report Stephen.

^ permalink raw reply

* Re: [PATCH 06/18] virtio_ring: avail event index interface
From: Rusty Russell @ 2011-05-09  4:13 UTC (permalink / raw)
  To: Michael S. Tsirkin, linux-kernel
  Cc: Carsten Otte, Christian Borntraeger, linux390, Martin Schwidefsky,
	Heiko Carstens, Shirley Ma, lguest, linux-kernel, virtualization,
	netdev, linux-s390, kvm, Krishna Kumar, Tom Lendacky, steved,
	habanero
In-Reply-To: <b00ef5d572ec9bc3b884c496025512abc1f14349.1304541919.git.mst@redhat.com>

On Wed, 4 May 2011 23:51:19 +0300, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> Define a new feature bit for the host to
> declare that it uses an avail_event index
> (like Xen) instead of a feature bit
> to enable/disable interrupts.
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>  include/linux/virtio_ring.h |   11 ++++++++---
>  1 files changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h
> index f5c1b75..f791772 100644
> --- a/include/linux/virtio_ring.h
> +++ b/include/linux/virtio_ring.h
> @@ -32,6 +32,9 @@
>  /* The Guest publishes the used index for which it expects an interrupt
>   * at the end of the avail ring. Host should ignore the avail->flags field. */
>  #define VIRTIO_RING_F_USED_EVENT_IDX	29
> +/* The Host publishes the avail index for which it expects a kick
> + * at the end of the used ring. Guest should ignore the used->flags field. */
> +#define VIRTIO_RING_F_AVAIL_EVENT_IDX	32

Are you really sure we want to separate the two?  Seems a little simpler
to have one bit to mean "we're publishing our threshold".  For someone
implementing this from scratch, it's a little simpler.

Or are there cases where the old style makes more sense?

Thanks,
Rusty.

^ permalink raw reply

* Re: [PATCH 1/2] ipv4: Pass flow keys down into datagram packet building engine.
From: David Miller @ 2011-05-09  4:07 UTC (permalink / raw)
  To: joe; +Cc: netdev
In-Reply-To: <Pine.LNX.4.30.1105081811570.26257-100000@mail.perches.com>

From: Joe Perches <joe@perches.com>
Date: Sun, 8 May 2011 18:14:46 -0700 (PDT)

> On Sun, 8 May 2011, David Miller wrote:
>> +static struct rtable *icmp_route_lookup(struct net *net,
>> +					struct flowi4 *fl4,
>> +					struct sk_buff *skb_in,
>>  					const struct iphdr *iph,
>>  					__be32 saddr, u8 tos,
>>  					int type, int code,
>>  					struct icmp_bxm *param)
>>  {
> []
>> +	memset(&fl4, 0, sizeof(fl4));
> 
> 	memset(fl4, 0, sizeof(*fl4));
> 
>> +	fl4->daddr = (param->replyopts.opt.opt.srr ?
>> +		      param->replyopts.opt.opt.faddr : iph->saddr);
> 

Thanks Joe :-/  I'll fix that.

^ permalink raw reply

* linux-next: build warning after merge of the net tree
From: Stephen Rothwell @ 2011-05-09  3:46 UTC (permalink / raw)
  To: David Miller, netdev; +Cc: linux-next, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 568 bytes --]

Hi all,

After merging the net tree, today's linux-next build (x86_64 allmodconfig)
produced this warning:

net/sctp/protocol.c: In function 'sctp_v4_xmit':
net/sctp/protocol.c:842: warning: format '%p' expects type 'void *', but argument 5 has type '__be32'
net/sctp/protocol.c:842: warning: format '%p' expects type 'void *', but argument 6 has type '__be32'

Caused by commit f1c0a276ea17 ("sctp: Don't use rt->rt_{src,dst} in
sctp_v4_xmit()").

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

[-- Attachment #2: Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply

* Re: future developments of usbnet
From: Ming Lei @ 2011-05-09  3:26 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <201105062045.37336.oliver-GvhC2dPhHPQdnm+yROfE0A@public.gmane.org>

Hi,

2011/5/7 Oliver Neukum <oliver-GvhC2dPhHPQdnm+yROfE0A@public.gmane.org>:
> Hi,
>
> I'd like to get a feeling what people are working out there regarding usbnet.
> So please, if you do something, or think something ought to be done, please
> speak up now.
>
> IMHO usbnet needs better support for
>
> - batching protocols
> - double buffering on the rx path
>
> with the latter having higher priority.
>
> Coments?

Maybe another thing about rx should be considered too: we should introduce
one flow control mechanism into rx path so that too much coming packets can
not consume many of memory and cause out of memory for atomic allocation.

Current implementation has such kind of risk certainly, as i explained in the
patch with title below:

       usbnet: runtime pm: fix out of memory

thanks,
-- 
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [v2 000/115] faster tree-based sysctl implementation
From: Eric W. Biederman @ 2011-05-09  3:11 UTC (permalink / raw)
  To: Lucian Adrian Grijincu
  Cc: linux-kernel, netdev, Alexey Dobriyan, Octavian Purdila,
	David S . Miller
In-Reply-To: <1304894407-32201-1-git-send-email-lucian.grijincu@gmail.com>

Lucian Adrian Grijincu <lucian.grijincu@gmail.com> writes:

> Eric also asked for:
> - replacing the per-header list of subdirs with a rbtree.
>   -- Again, lack of time, and this can always be added at a later time
>      to optimize lookup and duplicate checks. At the moment this patch
>      series does not add a complexity regression over the previous
>      implementation, au contraire.

Instead you add a dubious mechanisms, to avoid duplicates checks
because the duplicate checks takes time.

A good internal data structure should have logN duplicate checks
and that should not be cost prohibitive. For maintainability of
the users a good internal data structure removes the need for
special cases, and it actually makes use the sysctl fast.
Your implementation continues to have a O(N^2) cost to use the
sysctl files.

Getting a good internal data structure is the most important part
of a making sysctls fast, and keeping them fast.  Your patchset
fails in that.

So at that level, a great big NAK on the patchset as it currently
exists.


You have made a key observation that we only need better data
structures for the directories and can leave off that work
for the callers.

Thank you for that.

Additionally thank you for splitting up your changes to the
core of sysctl so there is a chance of reviewing those.

With respect to review and merging.  The priorities should be:
1) Figure out what the users really need to do, before we change
   everything so we only need one sweeping pass through the users
   that hopefully simplifies them.

2) Ensure you have a version of the new interfaces ready at the
   start of the patchset, so the sweeping changes can be made
   incrementally.

3) Clean up the users.

4) Make simplifications because you don't have any old users left.

I very much agree that the data structures that sysctl uses today
are far from ideal, and that we can make a lot of headway.

That said I think there is some good work in your patchset and I may
try and cherry pick out the good bits that we can merge now.

Eric

^ permalink raw reply

* Re: [PATCH] net/bonding: adjust codingstyle for bond_3ad files
From: Håkon Løvdal @ 2011-05-09  1:30 UTC (permalink / raw)
  To: David Miller
  Cc: mfmooney, joe, aquini, kernel-janitors, fubar, andy, shemminger,
	netdev, nikai
In-Reply-To: <20110508.171204.27789908.davem@davemloft.net>

On 9 May 2011 02:12, David Miller <davem@davemloft.net> wrote:
> From: Håkon Løvdal <hlovdal@gmail.com>
> Date: Mon, 9 May 2011 02:08:41 +0200
>
>> void bond_3ad_state_machine_handler(struct work_struct *work)
>> {
>>         struct bonding *bond = container_of(work, struct bonding,
>>                                             ad_work.work);
>>         struct port *port;
>>         struct aggregator *aggregator;
>>
>>         read_lock(&bond->lock);
>>
>>         if (! bond->kill_timers) {
>>
>>                 //check if there are any slaves
>>                 if (bond->slave_cnt != 0) {
>>                         ...
>>                 }
>>                 queue_delayed_work(bond->wq, &bond->ad_work, ad_delta_in_ticks);
>>         }
>>         read_unlock(&bond->lock);
>> }
>>
>>
>> And this was what I trying to reccommend against (and which the
>> stackoverflow question is about).
>
> I really don't see anything wrong with either approach.

The example above looks innocent since it is so small. But written
this way, the all of core of the function ("...") will have two extra,
unnecessary indentation levels. My rule of thumb is to always test for
exceptions, not the normal case.


Expanding the example to cover the full function, it could be like the
following:


void bond_3ad_state_machine_handler(struct work_struct *work)
{
        struct bonding *bond = container_of(work, struct bonding,
                                            ad_work.work);
        struct port *port;
        struct aggregator *aggregator;

        read_lock(&bond->lock);

        if (! bond->kill_timers)
        {
                //check if there are any slaves
                if (bond->slave_cnt != 0)
                {
                        int uninitialized = 0;
                        // check if agg_select_timer timer after
initialize is timed out
                        if (BOND_AD_INFO(bond).agg_select_timer &&
!(--BOND_AD_INFO(bond).agg_select_timer)) {
                                // select the active aggregator for the bond
                                if ((port = __get_first_port(bond))) {
                                        if (port->slave) {
                                                aggregator =
__get_first_agg(port);

ad_agg_selection_logic(aggregator);
                                        } else {
                                                pr_warning("%s:
Warning: bond's first port is uninitialized\n",
                                                           bond->dev->name);
                                                uninitialized = 1;
                                        }

                                }
                                if (! uninitialized)|
                                        bond_3ad_set_carrier(bond);
                        }

                        if (! uninitialized) {
                                // for each port run the state machines
                                for (port = __get_first_port(bond);
port; port = __get_next_port(port)) {
                                        if (port->slave) {

                                                /* Lock around state
machines to protect data accessed
                                                 * by all (e.g.,
port->sm_vars).  ad_rx_machine may run
                                                 * concurrently due to
incoming LACPDU.
                                                 */
                                                __get_state_machine_lock(port);

                                                ad_rx_machine(NULL, port);
                                                ad_periodic_machine(port);
                                                ad_port_selection_logic(port);
                                                ad_mux_machine(port);
                                                ad_tx_machine(port);

                                                // turn off the BEGIN
bit, since we already handled it
                                                if (port->sm_vars &
AD_PORT_BEGIN)
                                                        port->sm_vars
&= ~AD_PORT_BEGIN;


__release_state_machine_lock(port);
                                        } else {
                                                pr_warning("%s:
Warning: Found an uninitialized port\n",
                                                           bond->dev->name);
                                        }
                                }
                        }

                }
                queue_delayed_work(bond->wq, &bond->ad_work, ad_delta_in_ticks);
        }
        read_unlock(&bond->lock);
}

Notice for instance "aggregator = __get_first_agg(port);" now have 5
levels of indentation in addition to the function body level, compared
to just 2 in the original.

That extra indentation is one thing, but the most important in my opinion
is that layout of the code becomes harder to read. This is because the
error handling code will typically be much more intrusive in the "layout".
Good code written like

        if (error_check) {
                error
                handling
                code
        }
        normal
        code
        here

is possible to quickly scan/read, but if code is written like

        if (some_check) {
                error
                handling
                code
        } else {
                normal
                code
                here
        }

or

        if (some_check) {
                normal
                code
                here
        } else {
                error
                handling
                code
        }

you have to go into slow gear and read and mentally process (relatively)
much more of the code to get the same picture. Combine a few such if/else,
nested - now everything is tightly intervened and impossible to quickly
grasp and/or hold in your head.

And the "uninitialized" variable is a crutch.


BR Håkon Løvdal

^ permalink raw reply

* Re: [PATCH 1/2] ipv4: Pass flow keys down into datagram packet building engine.
From: Joe Perches @ 2011-05-09  1:14 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20110508.180753.241908549.davem@davemloft.net>

On Sun, 8 May 2011, David Miller wrote:
> +static struct rtable *icmp_route_lookup(struct net *net,
> +					struct flowi4 *fl4,
> +					struct sk_buff *skb_in,
>  					const struct iphdr *iph,
>  					__be32 saddr, u8 tos,
>  					int type, int code,
>  					struct icmp_bxm *param)
>  {
[]
> +	memset(&fl4, 0, sizeof(fl4));

	memset(fl4, 0, sizeof(*fl4));

> +	fl4->daddr = (param->replyopts.opt.opt.srr ?
> +		      param->replyopts.opt.opt.faddr : iph->saddr);


^ permalink raw reply

* [PATCH 2/2] ipv4: Pass flow key down into ip_append_*().
From: David Miller @ 2011-05-09  1:07 UTC (permalink / raw)
  To: netdev


This way rt->rt_dst accesses are unnecessary.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/ip.h     |    4 ++--
 net/ipv4/icmp.c      |    2 +-
 net/ipv4/ip_output.c |   18 ++++++++++--------
 net/ipv4/raw.c       |    2 +-
 net/ipv4/udp.c       |   12 +++++++-----
 5 files changed, 21 insertions(+), 17 deletions(-)

diff --git a/include/net/ip.h b/include/net/ip.h
index a425379..0b30d3a 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -106,7 +106,7 @@ extern int		__ip_local_out(struct sk_buff *skb);
 extern int		ip_local_out(struct sk_buff *skb);
 extern int		ip_queue_xmit(struct sk_buff *skb, struct flowi *fl);
 extern void		ip_init(void);
-extern int		ip_append_data(struct sock *sk,
+extern int		ip_append_data(struct sock *sk, struct flowi4 *fl4,
 				       int getfrag(void *from, char *to, int offset, int len,
 						   int odd, struct sk_buff *skb),
 				void *from, int len, int protolen,
@@ -114,7 +114,7 @@ extern int		ip_append_data(struct sock *sk,
 				struct rtable **rt,
 				unsigned int flags);
 extern int		ip_generic_getfrag(void *from, char *to, int offset, int len, int odd, struct sk_buff *skb);
-extern ssize_t		ip_append_page(struct sock *sk, struct page *page,
+extern ssize_t		ip_append_page(struct sock *sk, struct flowi4 *fl4, struct page *page,
 				int offset, size_t size, int flags);
 extern struct sk_buff  *__ip_make_skb(struct sock *sk,
 				      struct flowi4 *fl4,
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index b3dc6de..67ad66a 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -297,7 +297,7 @@ static void icmp_push_reply(struct icmp_bxm *icmp_param,
 	struct sk_buff *skb;
 
 	sk = icmp_sk(dev_net((*rt)->dst.dev));
-	if (ip_append_data(sk, icmp_glue_bits, icmp_param,
+	if (ip_append_data(sk, fl4, icmp_glue_bits, icmp_param,
 			   icmp_param->data_len+icmp_param->head_len,
 			   icmp_param->head_len,
 			   ipc, rt, MSG_DONTWAIT) < 0) {
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index dca637b..cd89d22 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -776,7 +776,9 @@ static inline int ip_ufo_append_data(struct sock *sk,
 				       (length - transhdrlen));
 }
 
-static int __ip_append_data(struct sock *sk, struct sk_buff_head *queue,
+static int __ip_append_data(struct sock *sk,
+			    struct flowi4 *fl4,
+			    struct sk_buff_head *queue,
 			    struct inet_cork *cork,
 			    int getfrag(void *from, char *to, int offset,
 					int len, int odd, struct sk_buff *skb),
@@ -808,7 +810,7 @@ static int __ip_append_data(struct sock *sk, struct sk_buff_head *queue,
 	maxfraglen = ((mtu - fragheaderlen) & ~7) + fragheaderlen;
 
 	if (cork->length + length > 0xFFFF - fragheaderlen) {
-		ip_local_error(sk, EMSGSIZE, rt->rt_dst, inet->inet_dport,
+		ip_local_error(sk, EMSGSIZE, fl4->daddr, inet->inet_dport,
 			       mtu-exthdrlen);
 		return -EMSGSIZE;
 	}
@@ -1083,7 +1085,7 @@ static int ip_setup_cork(struct sock *sk, struct inet_cork *cork,
  *
  *	LATER: length must be adjusted by pad at tail, when it is required.
  */
-int ip_append_data(struct sock *sk,
+int ip_append_data(struct sock *sk, struct flowi4 *fl4,
 		   int getfrag(void *from, char *to, int offset, int len,
 			       int odd, struct sk_buff *skb),
 		   void *from, int length, int transhdrlen,
@@ -1104,11 +1106,11 @@ int ip_append_data(struct sock *sk,
 		transhdrlen = 0;
 	}
 
-	return __ip_append_data(sk, &sk->sk_write_queue, &inet->cork.base, getfrag,
+	return __ip_append_data(sk, fl4, &sk->sk_write_queue, &inet->cork.base, getfrag,
 				from, length, transhdrlen, flags);
 }
 
-ssize_t	ip_append_page(struct sock *sk, struct page *page,
+ssize_t	ip_append_page(struct sock *sk, struct flowi4 *fl4, struct page *page,
 		       int offset, size_t size, int flags)
 {
 	struct inet_sock *inet = inet_sk(sk);
@@ -1146,7 +1148,7 @@ ssize_t	ip_append_page(struct sock *sk, struct page *page,
 	maxfraglen = ((mtu - fragheaderlen) & ~7) + fragheaderlen;
 
 	if (cork->length + size > 0xFFFF - fragheaderlen) {
-		ip_local_error(sk, EMSGSIZE, rt->rt_dst, inet->inet_dport, mtu);
+		ip_local_error(sk, EMSGSIZE, fl4->daddr, inet->inet_dport, mtu);
 		return -EMSGSIZE;
 	}
 
@@ -1427,7 +1429,7 @@ struct sk_buff *ip_make_skb(struct sock *sk,
 	if (err)
 		return ERR_PTR(err);
 
-	err = __ip_append_data(sk, &queue, &cork, getfrag,
+	err = __ip_append_data(sk, fl4, &queue, &cork, getfrag,
 			       from, length, transhdrlen, flags);
 	if (err) {
 		__ip_flush_pending_frames(sk, &queue, &cork);
@@ -1503,7 +1505,7 @@ void ip_send_reply(struct sock *sk, struct sk_buff *skb, struct ip_reply_arg *ar
 	sk->sk_priority = skb->priority;
 	sk->sk_protocol = ip_hdr(skb)->protocol;
 	sk->sk_bound_dev_if = arg->bound_dev_if;
-	ip_append_data(sk, ip_reply_glue_bits, arg->iov->iov_base, len, 0,
+	ip_append_data(sk, &fl4, ip_reply_glue_bits, arg->iov->iov_base, len, 0,
 		       &ipc, &rt, MSG_DONTWAIT);
 	if ((skb = skb_peek(&sk->sk_write_queue)) != NULL) {
 		if (arg->csumoffset >= 0)
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 6fee91f..11e17804 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -595,7 +595,7 @@ back_from_confirm:
 		if (!ipc.addr)
 			ipc.addr = fl4.daddr;
 		lock_sock(sk);
-		err = ip_append_data(sk, ip_generic_getfrag,
+		err = ip_append_data(sk, &fl4, ip_generic_getfrag,
 				     msg->msg_iov, len, 0,
 				     &ipc, &rt, msg->msg_flags);
 		if (err)
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 006e2cc..66341a3 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -822,6 +822,7 @@ int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 
 	getfrag = is_udplite ? udplite_getfrag : ip_generic_getfrag;
 
+	fl4 = &inet->cork.fl.u.ip4;
 	if (up->pending) {
 		/*
 		 * There are pending frames.
@@ -920,7 +921,6 @@ int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 	if (connected)
 		rt = (struct rtable *)sk_dst_check(sk, 0);
 
-	fl4 = &inet->cork.fl.u.ip4;
 	if (rt == NULL) {
 		struct net *net = sock_net(sk);
 
@@ -989,9 +989,9 @@ back_from_confirm:
 
 do_append_data:
 	up->len += ulen;
-	err = ip_append_data(sk, getfrag, msg->msg_iov, ulen,
-			sizeof(struct udphdr), &ipc, &rt,
-			corkreq ? msg->msg_flags|MSG_MORE : msg->msg_flags);
+	err = ip_append_data(sk, fl4, getfrag, msg->msg_iov, ulen,
+			     sizeof(struct udphdr), &ipc, &rt,
+			     corkreq ? msg->msg_flags|MSG_MORE : msg->msg_flags);
 	if (err)
 		udp_flush_pending_frames(sk);
 	else if (!corkreq)
@@ -1031,6 +1031,7 @@ EXPORT_SYMBOL(udp_sendmsg);
 int udp_sendpage(struct sock *sk, struct page *page, int offset,
 		 size_t size, int flags)
 {
+	struct inet_sock *inet = inet_sk(sk);
 	struct udp_sock *up = udp_sk(sk);
 	int ret;
 
@@ -1055,7 +1056,8 @@ int udp_sendpage(struct sock *sk, struct page *page, int offset,
 		return -EINVAL;
 	}
 
-	ret = ip_append_page(sk, page, offset, size, flags);
+	ret = ip_append_page(sk, &inet->cork.fl.u.ip4,
+			     page, offset, size, flags);
 	if (ret == -EOPNOTSUPP) {
 		release_sock(sk);
 		return sock_no_sendpage(sk->sk_socket, page, offset,
-- 
1.7.5.1


^ permalink raw reply related

* [PATCH 1/2] ipv4: Pass flow keys down into datagram packet building engine.
From: David Miller @ 2011-05-09  1:07 UTC (permalink / raw)
  To: netdev


This way ip_output.c no longer needs rt->rt_{src,dst}.

We already have these keys sitting, ready and waiting, on the stack or
in a socket structure.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/ip.h     |    8 +++--
 net/ipv4/icmp.c      |   74 ++++++++++++++++++++++++-------------------------
 net/ipv4/ip_output.c |   39 +++++++++++++-------------
 net/ipv4/raw.c       |   59 +++++++++++++++++++--------------------
 net/ipv4/udp.c       |    4 +-
 5 files changed, 91 insertions(+), 93 deletions(-)

diff --git a/include/net/ip.h b/include/net/ip.h
index acf8b78..a425379 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -117,12 +117,14 @@ extern int		ip_generic_getfrag(void *from, char *to, int offset, int len, int od
 extern ssize_t		ip_append_page(struct sock *sk, struct page *page,
 				int offset, size_t size, int flags);
 extern struct sk_buff  *__ip_make_skb(struct sock *sk,
+				      struct flowi4 *fl4,
 				      struct sk_buff_head *queue,
 				      struct inet_cork *cork);
 extern int		ip_send_skb(struct sk_buff *skb);
-extern int		ip_push_pending_frames(struct sock *sk);
+extern int		ip_push_pending_frames(struct sock *sk, struct flowi4 *fl4);
 extern void		ip_flush_pending_frames(struct sock *sk);
 extern struct sk_buff  *ip_make_skb(struct sock *sk,
+				    struct flowi4 *fl4,
 				    int getfrag(void *from, char *to, int offset, int len,
 						int odd, struct sk_buff *skb),
 				    void *from, int length, int transhdrlen,
@@ -130,9 +132,9 @@ extern struct sk_buff  *ip_make_skb(struct sock *sk,
 				    struct rtable **rtp,
 				    unsigned int flags);
 
-static inline struct sk_buff *ip_finish_skb(struct sock *sk)
+static inline struct sk_buff *ip_finish_skb(struct sock *sk, struct flowi4 *fl4)
 {
-	return __ip_make_skb(sk, &sk->sk_write_queue, &inet_sk(sk)->cork.base);
+	return __ip_make_skb(sk, fl4, &sk->sk_write_queue, &inet_sk(sk)->cork.base);
 }
 
 /* datagram.c */
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index cfeca3c..b3dc6de 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -290,6 +290,7 @@ static int icmp_glue_bits(void *from, char *to, int offset, int len, int odd,
 }
 
 static void icmp_push_reply(struct icmp_bxm *icmp_param,
+			    struct flowi4 *fl4,
 			    struct ipcm_cookie *ipc, struct rtable **rt)
 {
 	struct sock *sk;
@@ -315,7 +316,7 @@ static void icmp_push_reply(struct icmp_bxm *icmp_param,
 						 icmp_param->head_len, csum);
 		icmph->checksum = csum_fold(csum);
 		skb->ip_summed = CHECKSUM_NONE;
-		ip_push_pending_frames(sk);
+		ip_push_pending_frames(sk, fl4);
 	}
 }
 
@@ -328,6 +329,7 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb)
 	struct ipcm_cookie ipc;
 	struct rtable *rt = skb_rtable(skb);
 	struct net *net = dev_net(rt->dst.dev);
+	struct flowi4 fl4;
 	struct sock *sk;
 	struct inet_sock *inet;
 	__be32 daddr;
@@ -351,57 +353,52 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb)
 		if (ipc.opt->opt.srr)
 			daddr = icmp_param->replyopts.opt.opt.faddr;
 	}
-	{
-		struct flowi4 fl4 = {
-			.daddr = daddr,
-			.saddr = rt->rt_spec_dst,
-			.flowi4_tos = RT_TOS(ip_hdr(skb)->tos),
-			.flowi4_proto = IPPROTO_ICMP,
-		};
-		security_skb_classify_flow(skb, flowi4_to_flowi(&fl4));
-		rt = ip_route_output_key(net, &fl4);
-		if (IS_ERR(rt))
-			goto out_unlock;
-	}
+	memset(&fl4, 0, sizeof(fl4));
+	fl4.daddr = daddr;
+	fl4.saddr = rt->rt_spec_dst;
+	fl4.flowi4_tos = RT_TOS(ip_hdr(skb)->tos);
+	fl4.flowi4_proto = IPPROTO_ICMP;
+	security_skb_classify_flow(skb, flowi4_to_flowi(&fl4));
+	rt = ip_route_output_key(net, &fl4);
+	if (IS_ERR(rt))
+		goto out_unlock;
 	if (icmpv4_xrlim_allow(net, rt, icmp_param->data.icmph.type,
 			       icmp_param->data.icmph.code))
-		icmp_push_reply(icmp_param, &ipc, &rt);
+		icmp_push_reply(icmp_param, &fl4, &ipc, &rt);
 	ip_rt_put(rt);
 out_unlock:
 	icmp_xmit_unlock(sk);
 }
 
-static struct rtable *icmp_route_lookup(struct net *net, struct sk_buff *skb_in,
+static struct rtable *icmp_route_lookup(struct net *net,
+					struct flowi4 *fl4,
+					struct sk_buff *skb_in,
 					const struct iphdr *iph,
 					__be32 saddr, u8 tos,
 					int type, int code,
 					struct icmp_bxm *param)
 {
-	struct flowi4 fl4 = {
-		.daddr = (param->replyopts.opt.opt.srr ?
-			  param->replyopts.opt.opt.faddr : iph->saddr),
-		.saddr = saddr,
-		.flowi4_tos = RT_TOS(tos),
-		.flowi4_proto = IPPROTO_ICMP,
-		.fl4_icmp_type = type,
-		.fl4_icmp_code = code,
-	};
 	struct rtable *rt, *rt2;
 	int err;
 
-	security_skb_classify_flow(skb_in, flowi4_to_flowi(&fl4));
-	rt = __ip_route_output_key(net, &fl4);
+	memset(&fl4, 0, sizeof(fl4));
+	fl4->daddr = (param->replyopts.opt.opt.srr ?
+		      param->replyopts.opt.opt.faddr : iph->saddr);
+	fl4->saddr = saddr;
+	fl4->flowi4_tos = RT_TOS(tos);
+	fl4->flowi4_proto = IPPROTO_ICMP;
+	fl4->fl4_icmp_type = type;
+	fl4->fl4_icmp_code = code;
+	security_skb_classify_flow(skb_in, flowi4_to_flowi(fl4));
+	rt = __ip_route_output_key(net, fl4);
 	if (IS_ERR(rt))
 		return rt;
 
 	/* No need to clone since we're just using its address. */
 	rt2 = rt;
 
-	if (!fl4.saddr)
-		fl4.saddr = rt->rt_src;
-
 	rt = (struct rtable *) xfrm_lookup(net, &rt->dst,
-					   flowi4_to_flowi(&fl4), NULL, 0);
+					   flowi4_to_flowi(fl4), NULL, 0);
 	if (!IS_ERR(rt)) {
 		if (rt != rt2)
 			return rt;
@@ -410,19 +407,19 @@ static struct rtable *icmp_route_lookup(struct net *net, struct sk_buff *skb_in,
 	} else
 		return rt;
 
-	err = xfrm_decode_session_reverse(skb_in, flowi4_to_flowi(&fl4), AF_INET);
+	err = xfrm_decode_session_reverse(skb_in, flowi4_to_flowi(fl4), AF_INET);
 	if (err)
 		goto relookup_failed;
 
-	if (inet_addr_type(net, fl4.saddr) == RTN_LOCAL) {
-		rt2 = __ip_route_output_key(net, &fl4);
+	if (inet_addr_type(net, fl4->saddr) == RTN_LOCAL) {
+		rt2 = __ip_route_output_key(net, fl4);
 		if (IS_ERR(rt2))
 			err = PTR_ERR(rt2);
 	} else {
 		struct flowi4 fl4_2 = {};
 		unsigned long orefdst;
 
-		fl4_2.daddr = fl4.saddr;
+		fl4_2.daddr = fl4->saddr;
 		rt2 = ip_route_output_key(net, &fl4_2);
 		if (IS_ERR(rt2)) {
 			err = PTR_ERR(rt2);
@@ -430,7 +427,7 @@ static struct rtable *icmp_route_lookup(struct net *net, struct sk_buff *skb_in,
 		}
 		/* Ugh! */
 		orefdst = skb_in->_skb_refdst; /* save old refdst */
-		err = ip_route_input(skb_in, fl4.daddr, fl4.saddr,
+		err = ip_route_input(skb_in, fl4->daddr, fl4->saddr,
 				     RT_TOS(tos), rt2->dst.dev);
 
 		dst_release(&rt2->dst);
@@ -442,7 +439,7 @@ static struct rtable *icmp_route_lookup(struct net *net, struct sk_buff *skb_in,
 		goto relookup_failed;
 
 	rt2 = (struct rtable *) xfrm_lookup(net, &rt2->dst,
-					    flowi4_to_flowi(&fl4), NULL,
+					    flowi4_to_flowi(fl4), NULL,
 					    XFRM_LOOKUP_ICMP);
 	if (!IS_ERR(rt2)) {
 		dst_release(&rt->dst);
@@ -481,6 +478,7 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
 	struct icmp_bxm icmp_param;
 	struct rtable *rt = skb_rtable(skb_in);
 	struct ipcm_cookie ipc;
+	struct flowi4 fl4;
 	__be32 saddr;
 	u8  tos;
 	struct net *net;
@@ -599,7 +597,7 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
 	ipc.opt = &icmp_param.replyopts.opt;
 	ipc.tx_flags = 0;
 
-	rt = icmp_route_lookup(net, skb_in, iph, saddr, tos,
+	rt = icmp_route_lookup(net, &fl4, skb_in, iph, saddr, tos,
 			       type, code, &icmp_param);
 	if (IS_ERR(rt))
 		goto out_unlock;
@@ -620,7 +618,7 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
 		icmp_param.data_len = room;
 	icmp_param.head_len = sizeof(struct icmphdr);
 
-	icmp_push_reply(&icmp_param, &ipc, &rt);
+	icmp_push_reply(&icmp_param, &fl4, &ipc, &rt);
 ende:
 	ip_rt_put(rt);
 out_unlock:
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index b88ee5f..dca637b 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1267,6 +1267,7 @@ static void ip_cork_release(struct inet_cork *cork)
  *	and push them out.
  */
 struct sk_buff *__ip_make_skb(struct sock *sk,
+			      struct flowi4 *fl4,
 			      struct sk_buff_head *queue,
 			      struct inet_cork *cork)
 {
@@ -1333,8 +1334,8 @@ struct sk_buff *__ip_make_skb(struct sock *sk,
 	ip_select_ident(iph, &rt->dst, sk);
 	iph->ttl = ttl;
 	iph->protocol = sk->sk_protocol;
-	iph->saddr = rt->rt_src;
-	iph->daddr = rt->rt_dst;
+	iph->saddr = fl4->saddr;
+	iph->daddr = fl4->daddr;
 
 	skb->priority = sk->sk_priority;
 	skb->mark = sk->sk_mark;
@@ -1370,11 +1371,11 @@ int ip_send_skb(struct sk_buff *skb)
 	return err;
 }
 
-int ip_push_pending_frames(struct sock *sk)
+int ip_push_pending_frames(struct sock *sk, struct flowi4 *fl4)
 {
 	struct sk_buff *skb;
 
-	skb = ip_finish_skb(sk);
+	skb = ip_finish_skb(sk, fl4);
 	if (!skb)
 		return 0;
 
@@ -1403,6 +1404,7 @@ void ip_flush_pending_frames(struct sock *sk)
 }
 
 struct sk_buff *ip_make_skb(struct sock *sk,
+			    struct flowi4 *fl4,
 			    int getfrag(void *from, char *to, int offset,
 					int len, int odd, struct sk_buff *skb),
 			    void *from, int length, int transhdrlen,
@@ -1432,7 +1434,7 @@ struct sk_buff *ip_make_skb(struct sock *sk,
 		return ERR_PTR(err);
 	}
 
-	return __ip_make_skb(sk, &queue, &cork);
+	return __ip_make_skb(sk, fl4, &queue, &cork);
 }
 
 /*
@@ -1461,6 +1463,7 @@ void ip_send_reply(struct sock *sk, struct sk_buff *skb, struct ip_reply_arg *ar
 	struct inet_sock *inet = inet_sk(sk);
 	struct ip_options_data replyopts;
 	struct ipcm_cookie ipc;
+	struct flowi4 fl4;
 	__be32 daddr;
 	struct rtable *rt = skb_rtable(skb);
 
@@ -1478,20 +1481,16 @@ void ip_send_reply(struct sock *sk, struct sk_buff *skb, struct ip_reply_arg *ar
 			daddr = replyopts.opt.opt.faddr;
 	}
 
-	{
-		struct flowi4 fl4;
-
-		flowi4_init_output(&fl4, arg->bound_dev_if, 0,
-				   RT_TOS(ip_hdr(skb)->tos),
-				   RT_SCOPE_UNIVERSE, sk->sk_protocol,
-				   ip_reply_arg_flowi_flags(arg),
-				   daddr, rt->rt_spec_dst,
-				   tcp_hdr(skb)->source, tcp_hdr(skb)->dest);
-		security_skb_classify_flow(skb, flowi4_to_flowi(&fl4));
-		rt = ip_route_output_key(sock_net(sk), &fl4);
-		if (IS_ERR(rt))
-			return;
-	}
+	flowi4_init_output(&fl4, arg->bound_dev_if, 0,
+			   RT_TOS(ip_hdr(skb)->tos),
+			   RT_SCOPE_UNIVERSE, sk->sk_protocol,
+			   ip_reply_arg_flowi_flags(arg),
+			   daddr, rt->rt_spec_dst,
+			   tcp_hdr(skb)->source, tcp_hdr(skb)->dest);
+	security_skb_classify_flow(skb, flowi4_to_flowi(&fl4));
+	rt = ip_route_output_key(sock_net(sk), &fl4);
+	if (IS_ERR(rt))
+		return;
 
 	/* And let IP do all the hard work.
 
@@ -1512,7 +1511,7 @@ void ip_send_reply(struct sock *sk, struct sk_buff *skb, struct ip_reply_arg *ar
 			  arg->csumoffset) = csum_fold(csum_add(skb->csum,
 								arg->csum));
 		skb->ip_summed = CHECKSUM_NONE;
-		ip_push_pending_frames(sk);
+		ip_push_pending_frames(sk, &fl4);
 	}
 
 	bh_unlock_sock(sk);
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index a8659e0..6fee91f 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -314,9 +314,10 @@ int raw_rcv(struct sock *sk, struct sk_buff *skb)
 	return 0;
 }
 
-static int raw_send_hdrinc(struct sock *sk, void *from, size_t length,
-			struct rtable **rtp,
-			unsigned int flags)
+static int raw_send_hdrinc(struct sock *sk, struct flowi4 *fl4,
+			   void *from, size_t length,
+			   struct rtable **rtp,
+			   unsigned int flags)
 {
 	struct inet_sock *inet = inet_sk(sk);
 	struct net *net = sock_net(sk);
@@ -327,7 +328,7 @@ static int raw_send_hdrinc(struct sock *sk, void *from, size_t length,
 	struct rtable *rt = *rtp;
 
 	if (length > rt->dst.dev->mtu) {
-		ip_local_error(sk, EMSGSIZE, rt->rt_dst, inet->inet_dport,
+		ip_local_error(sk, EMSGSIZE, fl4->daddr, inet->inet_dport,
 			       rt->dst.dev->mtu);
 		return -EMSGSIZE;
 	}
@@ -372,7 +373,7 @@ static int raw_send_hdrinc(struct sock *sk, void *from, size_t length,
 
 	if (iphlen >= sizeof(*iph)) {
 		if (!iph->saddr)
-			iph->saddr = rt->rt_src;
+			iph->saddr = fl4->saddr;
 		iph->check   = 0;
 		iph->tot_len = htons(length);
 		if (!iph->id)
@@ -455,6 +456,7 @@ static int raw_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 	struct inet_sock *inet = inet_sk(sk);
 	struct ipcm_cookie ipc;
 	struct rtable *rt = NULL;
+	struct flowi4 fl4;
 	int free = 0;
 	__be32 daddr;
 	__be32 saddr;
@@ -558,27 +560,23 @@ static int raw_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 			saddr = inet->mc_addr;
 	}
 
-	{
-		struct flowi4 fl4;
+	flowi4_init_output(&fl4, ipc.oif, sk->sk_mark, tos,
+			   RT_SCOPE_UNIVERSE,
+			   inet->hdrincl ? IPPROTO_RAW : sk->sk_protocol,
+			   FLOWI_FLAG_CAN_SLEEP, daddr, saddr, 0, 0);
 
-		flowi4_init_output(&fl4, ipc.oif, sk->sk_mark, tos,
-				   RT_SCOPE_UNIVERSE,
-				   inet->hdrincl ? IPPROTO_RAW : sk->sk_protocol,
-				   FLOWI_FLAG_CAN_SLEEP, daddr, saddr, 0, 0);
-
-		if (!inet->hdrincl) {
-			err = raw_probe_proto_opt(&fl4, msg);
-			if (err)
-				goto done;
-		}
-
-		security_sk_classify_flow(sk, flowi4_to_flowi(&fl4));
-		rt = ip_route_output_flow(sock_net(sk), &fl4, sk);
-		if (IS_ERR(rt)) {
-			err = PTR_ERR(rt);
-			rt = NULL;
+	if (!inet->hdrincl) {
+		err = raw_probe_proto_opt(&fl4, msg);
+		if (err)
 			goto done;
-		}
+	}
+
+	security_sk_classify_flow(sk, flowi4_to_flowi(&fl4));
+	rt = ip_route_output_flow(sock_net(sk), &fl4, sk);
+	if (IS_ERR(rt)) {
+		err = PTR_ERR(rt);
+		rt = NULL;
+		goto done;
 	}
 
 	err = -EACCES;
@@ -590,19 +588,20 @@ static int raw_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 back_from_confirm:
 
 	if (inet->hdrincl)
-		err = raw_send_hdrinc(sk, msg->msg_iov, len,
-					&rt, msg->msg_flags);
+		err = raw_send_hdrinc(sk, &fl4, msg->msg_iov, len,
+				      &rt, msg->msg_flags);
 
 	 else {
 		if (!ipc.addr)
-			ipc.addr = rt->rt_dst;
+			ipc.addr = fl4.daddr;
 		lock_sock(sk);
-		err = ip_append_data(sk, ip_generic_getfrag, msg->msg_iov, len, 0,
-					&ipc, &rt, msg->msg_flags);
+		err = ip_append_data(sk, ip_generic_getfrag,
+				     msg->msg_iov, len, 0,
+				     &ipc, &rt, msg->msg_flags);
 		if (err)
 			ip_flush_pending_frames(sk);
 		else if (!(msg->msg_flags & MSG_MORE)) {
-			err = ip_push_pending_frames(sk);
+			err = ip_push_pending_frames(sk, &fl4);
 			if (err == -ENOBUFS && !inet->recverr)
 				err = 0;
 		}
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index ba9f137..006e2cc 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -774,7 +774,7 @@ static int udp_push_pending_frames(struct sock *sk)
 	struct sk_buff *skb;
 	int err = 0;
 
-	skb = ip_finish_skb(sk);
+	skb = ip_finish_skb(sk, fl4);
 	if (!skb)
 		goto out;
 
@@ -958,7 +958,7 @@ back_from_confirm:
 
 	/* Lockless fast path for the non-corking case. */
 	if (!corkreq) {
-		skb = ip_make_skb(sk, getfrag, msg->msg_iov, ulen,
+		skb = ip_make_skb(sk, fl4, getfrag, msg->msg_iov, ulen,
 				  sizeof(struct udphdr), &ipc, &rt,
 				  msg->msg_flags);
 		err = PTR_ERR(skb);
-- 
1.7.5.1


^ permalink raw reply related

* [PATCH 0/2] Get flow key down into ipv4 datagram packet engine
From: David Miller @ 2011-05-09  1:07 UTC (permalink / raw)
  To: netdev


The really nice part about this patch set is that it will subsequently
allow us to kill a lot of on-stack stuff.

Parts of what we pass around in the flow key is also passed around in
this "ipcm_cookie" object we use for datagram/raw/icmp packet
construction.

So now we can perform some minimizations on ipcm_cookie as a result.

^ permalink raw reply

* Re: [PATCH] net/bonding: adjust codingstyle for bond_3ad files
From: David Miller @ 2011-05-09  0:12 UTC (permalink / raw)
  To: hlovdal
  Cc: mfmooney, joe, aquini, kernel-janitors, fubar, andy, shemminger,
	netdev, nikai
In-Reply-To: <BANLkTimao-Zyk6ypW3UxzqtsgmQ-6qvYng@mail.gmail.com>

From: Håkon Løvdal <hlovdal@gmail.com>
Date: Mon, 9 May 2011 02:08:41 +0200

> void bond_3ad_state_machine_handler(struct work_struct *work)
> {
>         struct bonding *bond = container_of(work, struct bonding,
>                                             ad_work.work);
>         struct port *port;
>         struct aggregator *aggregator;
> 
>         read_lock(&bond->lock);
> 
>         if (! bond->kill_timers) {
> 
>                 //check if there are any slaves
>                 if (bond->slave_cnt != 0) {
>                         ...
>                 }
>                 queue_delayed_work(bond->wq, &bond->ad_work, ad_delta_in_ticks);
>         }
>         read_unlock(&bond->lock);
> }
> 
> 
> And this was what I trying to reccommend against (and which the
> stackoverflow question is about).

I really don't see anything wrong with either approach.


^ permalink raw reply

* Re: [PATCH] net/bonding: adjust codingstyle for bond_3ad files
From: Håkon Løvdal @ 2011-05-09  0:08 UTC (permalink / raw)
  To: David Miller
  Cc: mfmooney, joe, aquini, kernel-janitors, fubar, andy, shemminger,
	netdev, nikai
In-Reply-To: <20110508.161012.258121848.davem@davemloft.net>

On 9 May 2011 01:10, David Miller <davem@davemloft.net> wrote:
> From: Håkon Løvdal <hlovdal@gmail.com>
> Date: Mon, 9 May 2011 01:08:44 +0200
>
>> On 7 May 2011 21:35, matt mooney <mfmooney@gmail.com> wrote:
>>> But isn't the preferred style to have a single exit point?
>>
>> This is generally considered to be a bad advice, see
>> http://stackoverflow.com/questions/1701686/why-should-methods-have-a-single-entry-and-exit-points/1701721#1701721
>> for instance.
>
> That article totally ignores the issue of locking and how hard it is
> to get right without single exit points, and how unlocking in
> multiple spots bloats up the code.
>
> Definitely don't take that article's advice when working on the
> kernel.
>

I think we agree, but my answer was probably too short, unclear and
imprecise. In the case of locking and single exit points in the kernel,
they are (almost always) reached through goto/labels, and this is a fine
way of handling exiting a function, e.g.


void bond_3ad_state_machine_handler(struct work_struct *work)
{
        struct bonding *bond = container_of(work, struct bonding,
                                            ad_work.work);
        struct port *port;
        struct aggregator *aggregator;

        read_lock(&bond->lock);

        if (bond->kill_timers)
                goto out;

        //check if there are any slaves
        if (bond->slave_cnt == 0)
              goto re_arm;

        ...

re_arm:
        queue_delayed_work(bond->wq, &bond->ad_work, ad_delta_in_ticks);
out:
        read_unlock(&bond->lock);
}


I often advocate usage of goto to achive this kind of style.

What I assosiate with "writing a function as single exit style" would
be something like the following (and usually littered with
temporary remember-this-for-later variables).


void bond_3ad_state_machine_handler(struct work_struct *work)
{
        struct bonding *bond = container_of(work, struct bonding,
                                            ad_work.work);
        struct port *port;
        struct aggregator *aggregator;

        read_lock(&bond->lock);

        if (! bond->kill_timers) {

                //check if there are any slaves
                if (bond->slave_cnt != 0) {
                        ...
                }
                queue_delayed_work(bond->wq, &bond->ad_work, ad_delta_in_ticks);
        }
        read_unlock(&bond->lock);
}


And this was what I trying to reccommend against (and which the
stackoverflow question is about). So most probably my assosiasion was too
implicit to make my reply useful. That was not the intention, hopefully
this followup clears up a little.

BR Håkon Løvdal

^ permalink raw reply

* [PATCH] udp: Use flow key information instead of rt->rt_{src,dst}
From: David Miller @ 2011-05-08 23:40 UTC (permalink / raw)
  To: netdev


We have two cases.

Either the socket is in TCP_ESTABLISHED state and connect() filled
in the inet socket cork flow, or we looked up the route here and
used an on-stack flow.

Track which one it was, and use it to obtain src/dst addrs.

Signed-off-by: David S. Miller <davem@davemloft.net>
---

This is a prelude to passing a flow key down into the datagram
packet sending engine interfaces in ip_output.c

 net/ipv4/udp.c |   14 ++++++++------
 1 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 544f435..ba9f137 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -791,6 +791,7 @@ int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 {
 	struct inet_sock *inet = inet_sk(sk);
 	struct udp_sock *up = udp_sk(sk);
+	struct flowi4 fl4_stack;
 	struct flowi4 *fl4;
 	int ulen = len;
 	struct ipcm_cookie ipc;
@@ -919,17 +920,18 @@ int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 	if (connected)
 		rt = (struct rtable *)sk_dst_check(sk, 0);
 
+	fl4 = &inet->cork.fl.u.ip4;
 	if (rt == NULL) {
-		struct flowi4 fl4;
 		struct net *net = sock_net(sk);
 
-		flowi4_init_output(&fl4, ipc.oif, sk->sk_mark, tos,
+		fl4 = &fl4_stack;
+		flowi4_init_output(fl4, ipc.oif, sk->sk_mark, tos,
 				   RT_SCOPE_UNIVERSE, sk->sk_protocol,
 				   inet_sk_flowi_flags(sk)|FLOWI_FLAG_CAN_SLEEP,
 				   faddr, saddr, dport, inet->inet_sport);
 
-		security_sk_classify_flow(sk, flowi4_to_flowi(&fl4));
-		rt = ip_route_output_flow(net, &fl4, sk);
+		security_sk_classify_flow(sk, flowi4_to_flowi(fl4));
+		rt = ip_route_output_flow(net, fl4, sk);
 		if (IS_ERR(rt)) {
 			err = PTR_ERR(rt);
 			rt = NULL;
@@ -950,9 +952,9 @@ int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 		goto do_confirm;
 back_from_confirm:
 
-	saddr = rt->rt_src;
+	saddr = fl4->saddr;
 	if (!ipc.addr)
-		daddr = ipc.addr = rt->rt_dst;
+		daddr = ipc.addr = fl4->daddr;
 
 	/* Lockless fast path for the non-corking case. */
 	if (!corkreq) {
-- 
1.7.5.1


^ permalink raw reply related

* [PATCH 2/2] IPVS: init and cleanup restructuring
From: Simon Horman @ 2011-05-08 23:17 UTC (permalink / raw)
  To: lvs-devel, netdev, netfilter-devel, netfilter
  Cc: Wensong Zhang, Julian Anastasov, Patrick McHardy,
	Hans Schillstrom, Hans Schillstrom, Eric W. Biederman,
	Simon Horman
In-Reply-To: <1304896671-29384-1-git-send-email-horms@verge.net.au>

From: Hans Schillstrom <hans@schillstrom.com>

DESCRIPTION
This patch tries to restore the initial init and cleanup
sequences that was before namspace patch.
Netns also requires action when net devices unregister
which has never been implemented. I.e this patch also
covers when a device moves into a network namespace,
and has to be released.

IMPLEMENTATION
The number of calls to register_pernet_device have been
reduced to one for the ip_vs.ko
Schedulers still have their own calls.

This patch adds a function __ip_vs_service_cleanup()
and an enable flag for the netfilter hooks.

The nf hooks will be enabled when the first service is loaded
and never disabled again, except when a namespace exit starts.

Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
[horms@verge.net.au: minor edit to changelog]
Signed-off-by: Simon Horman <horms@verge.net.au>
---
 include/net/ip_vs.h              |   17 +++++
 net/netfilter/ipvs/ip_vs_app.c   |   15 +----
 net/netfilter/ipvs/ip_vs_conn.c  |   12 +---
 net/netfilter/ipvs/ip_vs_core.c  |  101 +++++++++++++++++++++++++++++---
 net/netfilter/ipvs/ip_vs_ctl.c   |  120 ++++++++++++++++++++++++++++++++------
 net/netfilter/ipvs/ip_vs_est.c   |   14 +----
 net/netfilter/ipvs/ip_vs_proto.c |   11 +---
 net/netfilter/ipvs/ip_vs_sync.c  |   13 +---
 8 files changed, 223 insertions(+), 80 deletions(-)

diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index 4d1b71a..02f6702 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -791,6 +791,7 @@ struct ip_vs_app {
 /* IPVS in network namespace */
 struct netns_ipvs {
 	int			gen;		/* Generation */
+	int			enable;		/* enable like nf_hooks do */
 	/*
 	 *	Hash table: for real service lookups
 	 */
@@ -1089,6 +1090,22 @@ ip_vs_control_add(struct ip_vs_conn *cp, struct ip_vs_conn *ctl_cp)
 	atomic_inc(&ctl_cp->n_control);
 }
 
+/*
+ * IPVS netns init & cleanup functions
+ */
+extern int __ip_vs_estimator_init(struct net *net);
+extern int __ip_vs_control_init(struct net *net);
+extern int __ip_vs_protocol_init(struct net *net);
+extern int __ip_vs_app_init(struct net *net);
+extern int __ip_vs_conn_init(struct net *net);
+extern int __ip_vs_sync_init(struct net *net);
+extern void __ip_vs_conn_cleanup(struct net *net);
+extern void __ip_vs_app_cleanup(struct net *net);
+extern void __ip_vs_protocol_cleanup(struct net *net);
+extern void __ip_vs_control_cleanup(struct net *net);
+extern void __ip_vs_estimator_cleanup(struct net *net);
+extern void __ip_vs_sync_cleanup(struct net *net);
+extern void __ip_vs_service_cleanup(struct net *net);
 
 /*
  *      IPVS application functions
diff --git a/net/netfilter/ipvs/ip_vs_app.c b/net/netfilter/ipvs/ip_vs_app.c
index 2dc6de1..51f3af7 100644
--- a/net/netfilter/ipvs/ip_vs_app.c
+++ b/net/netfilter/ipvs/ip_vs_app.c
@@ -576,7 +576,7 @@ static const struct file_operations ip_vs_app_fops = {
 };
 #endif
 
-static int __net_init __ip_vs_app_init(struct net *net)
+int __net_init __ip_vs_app_init(struct net *net)
 {
 	struct netns_ipvs *ipvs = net_ipvs(net);
 
@@ -585,26 +585,17 @@ static int __net_init __ip_vs_app_init(struct net *net)
 	return 0;
 }
 
-static void __net_exit __ip_vs_app_cleanup(struct net *net)
+void __net_exit __ip_vs_app_cleanup(struct net *net)
 {
 	proc_net_remove(net, "ip_vs_app");
 }
 
-static struct pernet_operations ip_vs_app_ops = {
-	.init = __ip_vs_app_init,
-	.exit = __ip_vs_app_cleanup,
-};
-
 int __init ip_vs_app_init(void)
 {
-	int rv;
-
-	rv = register_pernet_subsys(&ip_vs_app_ops);
-	return rv;
+	return 0;
 }
 
 
 void ip_vs_app_cleanup(void)
 {
-	unregister_pernet_subsys(&ip_vs_app_ops);
 }
diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
index f289306..5092505 100644
--- a/net/netfilter/ipvs/ip_vs_conn.c
+++ b/net/netfilter/ipvs/ip_vs_conn.c
@@ -1258,22 +1258,17 @@ int __net_init __ip_vs_conn_init(struct net *net)
 	return 0;
 }
 
-static void __net_exit __ip_vs_conn_cleanup(struct net *net)
+void __net_exit __ip_vs_conn_cleanup(struct net *net)
 {
 	/* flush all the connection entries first */
 	ip_vs_conn_flush(net);
 	proc_net_remove(net, "ip_vs_conn");
 	proc_net_remove(net, "ip_vs_conn_sync");
 }
-static struct pernet_operations ipvs_conn_ops = {
-	.init = __ip_vs_conn_init,
-	.exit = __ip_vs_conn_cleanup,
-};
 
 int __init ip_vs_conn_init(void)
 {
 	int idx;
-	int retc;
 
 	/* Compute size and mask */
 	ip_vs_conn_tab_size = 1 << ip_vs_conn_tab_bits;
@@ -1309,17 +1304,14 @@ int __init ip_vs_conn_init(void)
 		rwlock_init(&__ip_vs_conntbl_lock_array[idx].l);
 	}
 
-	retc = register_pernet_subsys(&ipvs_conn_ops);
-
 	/* calculate the random value for connection hash */
 	get_random_bytes(&ip_vs_conn_rnd, sizeof(ip_vs_conn_rnd));
 
-	return retc;
+	return 0;
 }
 
 void ip_vs_conn_cleanup(void)
 {
-	unregister_pernet_subsys(&ipvs_conn_ops);
 	/* Release the empty cache */
 	kmem_cache_destroy(ip_vs_conn_cachep);
 	vfree(ip_vs_conn_tab);
diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
index a0791dc..a74dae6 100644
--- a/net/netfilter/ipvs/ip_vs_core.c
+++ b/net/netfilter/ipvs/ip_vs_core.c
@@ -1113,6 +1113,9 @@ ip_vs_out(unsigned int hooknum, struct sk_buff *skb, int af)
 		return NF_ACCEPT;
 
 	net = skb_net(skb);
+	if (!net_ipvs(net)->enable)
+		return NF_ACCEPT;
+
 	ip_vs_fill_iphdr(af, skb_network_header(skb), &iph);
 #ifdef CONFIG_IP_VS_IPV6
 	if (af == AF_INET6) {
@@ -1343,6 +1346,7 @@ ip_vs_in_icmp(struct sk_buff *skb, int *related, unsigned int hooknum)
 		return NF_ACCEPT; /* The packet looks wrong, ignore */
 
 	net = skb_net(skb);
+
 	pd = ip_vs_proto_data_get(net, cih->protocol);
 	if (!pd)
 		return NF_ACCEPT;
@@ -1529,6 +1533,11 @@ ip_vs_in(unsigned int hooknum, struct sk_buff *skb, int af)
 			      IP_VS_DBG_ADDR(af, &iph.daddr), hooknum);
 		return NF_ACCEPT;
 	}
+	/* ipvs enabled in this netns ? */
+	net = skb_net(skb);
+	if (!net_ipvs(net)->enable)
+		return NF_ACCEPT;
+
 	ip_vs_fill_iphdr(af, skb_network_header(skb), &iph);
 
 	/* Bad... Do not break raw sockets */
@@ -1562,7 +1571,6 @@ ip_vs_in(unsigned int hooknum, struct sk_buff *skb, int af)
 			ip_vs_fill_iphdr(af, skb_network_header(skb), &iph);
 		}
 
-	net = skb_net(skb);
 	/* Protocol supported? */
 	pd = ip_vs_proto_data_get(net, iph.protocol);
 	if (unlikely(!pd))
@@ -1588,7 +1596,6 @@ ip_vs_in(unsigned int hooknum, struct sk_buff *skb, int af)
 	}
 
 	IP_VS_DBG_PKT(11, af, pp, skb, 0, "Incoming packet");
-	net = skb_net(skb);
 	ipvs = net_ipvs(net);
 	/* Check the server status */
 	if (cp->dest && !(cp->dest->flags & IP_VS_DEST_F_AVAILABLE)) {
@@ -1743,10 +1750,16 @@ ip_vs_forward_icmp(unsigned int hooknum, struct sk_buff *skb,
 		   int (*okfn)(struct sk_buff *))
 {
 	int r;
+	struct net *net;
 
 	if (ip_hdr(skb)->protocol != IPPROTO_ICMP)
 		return NF_ACCEPT;
 
+	/* ipvs enabled in this netns ? */
+	net = skb_net(skb);
+	if (!net_ipvs(net)->enable)
+		return NF_ACCEPT;
+
 	return ip_vs_in_icmp(skb, &r, hooknum);
 }
 
@@ -1757,10 +1770,16 @@ ip_vs_forward_icmp_v6(unsigned int hooknum, struct sk_buff *skb,
 		      int (*okfn)(struct sk_buff *))
 {
 	int r;
+	struct net *net;
 
 	if (ipv6_hdr(skb)->nexthdr != IPPROTO_ICMPV6)
 		return NF_ACCEPT;
 
+	/* ipvs enabled in this netns ? */
+	net = skb_net(skb);
+	if (!net_ipvs(net)->enable)
+		return NF_ACCEPT;
+
 	return ip_vs_in_icmp_v6(skb, &r, hooknum);
 }
 #endif
@@ -1884,21 +1903,72 @@ static int __net_init __ip_vs_init(struct net *net)
 		pr_err("%s(): no memory.\n", __func__);
 		return -ENOMEM;
 	}
+	/* Hold the beast until a service is registerd */
+	ipvs->enable = 0;
 	ipvs->net = net;
 	/* Counters used for creating unique names */
 	ipvs->gen = atomic_read(&ipvs_netns_cnt);
 	atomic_inc(&ipvs_netns_cnt);
 	net->ipvs = ipvs;
+
+	if (__ip_vs_estimator_init(net) < 0)
+		goto estimator_fail;
+
+	if (__ip_vs_control_init(net) < 0)
+		goto control_fail;
+
+	if (__ip_vs_protocol_init(net) < 0)
+		goto protocol_fail;
+
+	if (__ip_vs_app_init(net) < 0)
+		goto app_fail;
+
+	if (__ip_vs_conn_init(net) < 0)
+		goto conn_fail;
+
+	if (__ip_vs_sync_init(net) < 0)
+		goto sync_fail;
+
 	printk(KERN_INFO "IPVS: Creating netns size=%zu id=%d\n",
 			 sizeof(struct netns_ipvs), ipvs->gen);
 	return 0;
+/*
+ * Error handling
+ */
+
+sync_fail:
+	__ip_vs_conn_cleanup(net);
+conn_fail:
+	__ip_vs_app_cleanup(net);
+app_fail:
+	__ip_vs_protocol_cleanup(net);
+protocol_fail:
+	__ip_vs_control_cleanup(net);
+control_fail:
+	__ip_vs_estimator_cleanup(net);
+estimator_fail:
+	return -ENOMEM;
 }
 
 static void __net_exit __ip_vs_cleanup(struct net *net)
 {
+	__ip_vs_service_cleanup(net);	/* ip_vs_flush() with locks */
+	__ip_vs_conn_cleanup(net);
+	__ip_vs_app_cleanup(net);
+	__ip_vs_protocol_cleanup(net);
+	__ip_vs_control_cleanup(net);
+	__ip_vs_estimator_cleanup(net);
 	IP_VS_DBG(2, "ipvs netns %d released\n", net_ipvs(net)->gen);
 }
 
+static void __net_exit __ip_vs_dev_cleanup(struct net *net)
+{
+	EnterFunction(2);
+	net_ipvs(net)->enable = 0;	/* Disable packet reception */
+	__ip_vs_sync_cleanup(net);
+	LeaveFunction(2);
+}
+
 static struct pernet_operations ipvs_core_ops = {
 	.init = __ip_vs_init,
 	.exit = __ip_vs_cleanup,
@@ -1906,6 +1976,10 @@ static struct pernet_operations ipvs_core_ops = {
 	.size = sizeof(struct netns_ipvs),
 };
 
+static struct pernet_operations ipvs_core_dev_ops = {
+	.exit = __ip_vs_dev_cleanup,
+};
+
 /*
  *	Initialize IP Virtual Server
  */
@@ -1913,10 +1987,6 @@ static int __init ip_vs_init(void)
 {
 	int ret;
 
-	ret = register_pernet_subsys(&ipvs_core_ops);	/* Alloc ip_vs struct */
-	if (ret < 0)
-		return ret;
-
 	ip_vs_estimator_init();
 	ret = ip_vs_control_init();
 	if (ret < 0) {
@@ -1944,15 +2014,28 @@ static int __init ip_vs_init(void)
 		goto cleanup_conn;
 	}
 
+	ret = register_pernet_subsys(&ipvs_core_ops);	/* Alloc ip_vs struct */
+	if (ret < 0)
+		goto cleanup_sync;
+
+	ret = register_pernet_device(&ipvs_core_dev_ops);
+	if (ret < 0)
+		goto cleanup_sub;
+
 	ret = nf_register_hooks(ip_vs_ops, ARRAY_SIZE(ip_vs_ops));
 	if (ret < 0) {
 		pr_err("can't register hooks.\n");
-		goto cleanup_sync;
+		goto cleanup_dev;
 	}
 
 	pr_info("ipvs loaded.\n");
+
 	return ret;
 
+cleanup_dev:
+	unregister_pernet_device(&ipvs_core_dev_ops);
+cleanup_sub:
+	unregister_pernet_subsys(&ipvs_core_ops);
 cleanup_sync:
 	ip_vs_sync_cleanup();
   cleanup_conn:
@@ -1964,20 +2047,20 @@ cleanup_sync:
 	ip_vs_control_cleanup();
   cleanup_estimator:
 	ip_vs_estimator_cleanup();
-	unregister_pernet_subsys(&ipvs_core_ops);	/* free ip_vs struct */
 	return ret;
 }
 
 static void __exit ip_vs_cleanup(void)
 {
 	nf_unregister_hooks(ip_vs_ops, ARRAY_SIZE(ip_vs_ops));
+	unregister_pernet_device(&ipvs_core_dev_ops);
+	unregister_pernet_subsys(&ipvs_core_ops);	/* free ip_vs struct */
 	ip_vs_sync_cleanup();
 	ip_vs_conn_cleanup();
 	ip_vs_app_cleanup();
 	ip_vs_protocol_cleanup();
 	ip_vs_control_cleanup();
 	ip_vs_estimator_cleanup();
-	unregister_pernet_subsys(&ipvs_core_ops);	/* free ip_vs struct */
 	pr_info("ipvs unloaded.\n");
 }
 
diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index ae47090..ea72281 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -69,6 +69,11 @@ int ip_vs_get_debug_level(void)
 }
 #endif
 
+
+/*  Protos */
+static void __ip_vs_del_service(struct ip_vs_service *svc);
+
+
 #ifdef CONFIG_IP_VS_IPV6
 /* Taken from rt6_fill_node() in net/ipv6/route.c, is there a better way? */
 static int __ip_vs_addr_is_local_v6(struct net *net,
@@ -1214,6 +1219,8 @@ ip_vs_add_service(struct net *net, struct ip_vs_service_user_kern *u,
 	write_unlock_bh(&__ip_vs_svc_lock);
 
 	*svc_p = svc;
+	/* Now there is a service - full throttle */
+	ipvs->enable = 1;
 	return 0;
 
 
@@ -1472,6 +1479,84 @@ static int ip_vs_flush(struct net *net)
 	return 0;
 }
 
+/*
+ *	Delete service by {netns} in the service table.
+ *	Called by __ip_vs_cleanup()
+ */
+void __ip_vs_service_cleanup(struct net *net)
+{
+	EnterFunction(2);
+	/* Check for "full" addressed entries */
+	mutex_lock(&__ip_vs_mutex);
+	ip_vs_flush(net);
+	mutex_unlock(&__ip_vs_mutex);
+	LeaveFunction(2);
+}
+/*
+ * Release dst hold by dst_cache
+ */
+static inline void
+__ip_vs_dev_reset(struct ip_vs_dest *dest, struct net_device *dev)
+{
+	spin_lock_bh(&dest->dst_lock);
+	if (dest->dst_cache && dest->dst_cache->dev == dev) {
+		IP_VS_DBG_BUF(3, "Reset dev:%s dest %s:%u ,dest->refcnt=%d\n",
+			      dev->name,
+			      IP_VS_DBG_ADDR(dest->af, &dest->addr),
+			      ntohs(dest->port),
+			      atomic_read(&dest->refcnt));
+		ip_vs_dst_reset(dest);
+	}
+	spin_unlock_bh(&dest->dst_lock);
+
+}
+/*
+ * Netdev event receiver
+ * Currently only NETDEV_UNREGISTER is handled, i.e. if we hold a reference to
+ * a device that is "unregister" it must be released.
+ */
+static int ip_vs_dst_event(struct notifier_block *this, unsigned long event,
+			    void *ptr)
+{
+	struct net_device *dev = ptr;
+	struct net *net = dev_net(dev);
+	struct ip_vs_service *svc;
+	struct ip_vs_dest *dest;
+	unsigned int idx;
+
+	if (event != NETDEV_UNREGISTER)
+		return NOTIFY_DONE;
+	IP_VS_DBG(3, "%s() dev=%s\n", __func__, dev->name);
+	EnterFunction(2);
+	mutex_lock(&__ip_vs_mutex);
+	for (idx = 0; idx < IP_VS_SVC_TAB_SIZE; idx++) {
+		list_for_each_entry(svc, &ip_vs_svc_table[idx], s_list) {
+			if (net_eq(svc->net, net)) {
+				list_for_each_entry(dest, &svc->destinations,
+						    n_list) {
+					__ip_vs_dev_reset(dest, dev);
+				}
+			}
+		}
+
+		list_for_each_entry(svc, &ip_vs_svc_fwm_table[idx], f_list) {
+			if (net_eq(svc->net, net)) {
+				list_for_each_entry(dest, &svc->destinations,
+						    n_list) {
+					__ip_vs_dev_reset(dest, dev);
+				}
+			}
+
+		}
+	}
+
+	list_for_each_entry(dest, &net_ipvs(net)->dest_trash, n_list) {
+		__ip_vs_dev_reset(dest, dev);
+	}
+	mutex_unlock(&__ip_vs_mutex);
+	LeaveFunction(2);
+	return NOTIFY_DONE;
+}
 
 /*
  *	Zero counters in a service or all services
@@ -3588,6 +3673,10 @@ void __net_init __ip_vs_control_cleanup_sysctl(struct net *net) { }
 
 #endif
 
+static struct notifier_block ip_vs_dst_notifier = {
+	.notifier_call = ip_vs_dst_event,
+};
+
 int __net_init __ip_vs_control_init(struct net *net)
 {
 	int idx;
@@ -3626,7 +3715,7 @@ err:
 	return -ENOMEM;
 }
 
-static void __net_exit __ip_vs_control_cleanup(struct net *net)
+void __net_exit __ip_vs_control_cleanup(struct net *net)
 {
 	struct netns_ipvs *ipvs = net_ipvs(net);
 
@@ -3639,11 +3728,6 @@ static void __net_exit __ip_vs_control_cleanup(struct net *net)
 	free_percpu(ipvs->tot_stats.cpustats);
 }
 
-static struct pernet_operations ipvs_control_ops = {
-	.init = __ip_vs_control_init,
-	.exit = __ip_vs_control_cleanup,
-};
-
 int __init ip_vs_control_init(void)
 {
 	int idx;
@@ -3657,33 +3741,32 @@ int __init ip_vs_control_init(void)
 		INIT_LIST_HEAD(&ip_vs_svc_fwm_table[idx]);
 	}
 
-	ret = register_pernet_subsys(&ipvs_control_ops);
-	if (ret) {
-		pr_err("cannot register namespace.\n");
-		goto err;
-	}
-
 	smp_wmb();	/* Do we really need it now ? */
 
 	ret = nf_register_sockopt(&ip_vs_sockopts);
 	if (ret) {
 		pr_err("cannot register sockopt.\n");
-		goto err_net;
+		goto err_sock;
 	}
 
 	ret = ip_vs_genl_register();
 	if (ret) {
 		pr_err("cannot register Generic Netlink interface.\n");
-		nf_unregister_sockopt(&ip_vs_sockopts);
-		goto err_net;
+		goto err_genl;
 	}
 
+	ret = register_netdevice_notifier(&ip_vs_dst_notifier);
+	if (ret < 0)
+		goto err_notf;
+
 	LeaveFunction(2);
 	return 0;
 
-err_net:
-	unregister_pernet_subsys(&ipvs_control_ops);
-err:
+err_notf:
+	ip_vs_genl_unregister();
+err_genl:
+	nf_unregister_sockopt(&ip_vs_sockopts);
+err_sock:
 	return ret;
 }
 
@@ -3691,7 +3774,6 @@ err:
 void ip_vs_control_cleanup(void)
 {
 	EnterFunction(2);
-	unregister_pernet_subsys(&ipvs_control_ops);
 	ip_vs_genl_unregister();
 	nf_unregister_sockopt(&ip_vs_sockopts);
 	LeaveFunction(2);
diff --git a/net/netfilter/ipvs/ip_vs_est.c b/net/netfilter/ipvs/ip_vs_est.c
index 8c8766c..508cce9 100644
--- a/net/netfilter/ipvs/ip_vs_est.c
+++ b/net/netfilter/ipvs/ip_vs_est.c
@@ -192,7 +192,7 @@ void ip_vs_read_estimator(struct ip_vs_stats_user *dst,
 	dst->outbps = (e->outbps + 0xF) >> 5;
 }
 
-static int __net_init __ip_vs_estimator_init(struct net *net)
+int __net_init __ip_vs_estimator_init(struct net *net)
 {
 	struct netns_ipvs *ipvs = net_ipvs(net);
 
@@ -203,24 +203,16 @@ static int __net_init __ip_vs_estimator_init(struct net *net)
 	return 0;
 }
 
-static void __net_exit __ip_vs_estimator_exit(struct net *net)
+void __net_exit __ip_vs_estimator_cleanup(struct net *net)
 {
 	del_timer_sync(&net_ipvs(net)->est_timer);
 }
-static struct pernet_operations ip_vs_app_ops = {
-	.init = __ip_vs_estimator_init,
-	.exit = __ip_vs_estimator_exit,
-};
 
 int __init ip_vs_estimator_init(void)
 {
-	int rv;
-
-	rv = register_pernet_subsys(&ip_vs_app_ops);
-	return rv;
+	return 0;
 }
 
 void ip_vs_estimator_cleanup(void)
 {
-	unregister_pernet_subsys(&ip_vs_app_ops);
 }
diff --git a/net/netfilter/ipvs/ip_vs_proto.c b/net/netfilter/ipvs/ip_vs_proto.c
index 17484a4..eb86028 100644
--- a/net/netfilter/ipvs/ip_vs_proto.c
+++ b/net/netfilter/ipvs/ip_vs_proto.c
@@ -316,7 +316,7 @@ ip_vs_tcpudp_debug_packet(int af, struct ip_vs_protocol *pp,
 /*
  * per network name-space init
  */
-static int __net_init __ip_vs_protocol_init(struct net *net)
+int __net_init __ip_vs_protocol_init(struct net *net)
 {
 #ifdef CONFIG_IP_VS_PROTO_TCP
 	register_ip_vs_proto_netns(net, &ip_vs_protocol_tcp);
@@ -336,7 +336,7 @@ static int __net_init __ip_vs_protocol_init(struct net *net)
 	return 0;
 }
 
-static void __net_exit __ip_vs_protocol_cleanup(struct net *net)
+void __net_exit __ip_vs_protocol_cleanup(struct net *net)
 {
 	struct netns_ipvs *ipvs = net_ipvs(net);
 	struct ip_vs_proto_data *pd;
@@ -349,11 +349,6 @@ static void __net_exit __ip_vs_protocol_cleanup(struct net *net)
 	}
 }
 
-static struct pernet_operations ipvs_proto_ops = {
-	.init = __ip_vs_protocol_init,
-	.exit = __ip_vs_protocol_cleanup,
-};
-
 int __init ip_vs_protocol_init(void)
 {
 	char protocols[64];
@@ -382,7 +377,6 @@ int __init ip_vs_protocol_init(void)
 	REGISTER_PROTOCOL(&ip_vs_protocol_esp);
 #endif
 	pr_info("Registered protocols (%s)\n", &protocols[2]);
-	return register_pernet_subsys(&ipvs_proto_ops);
 
 	return 0;
 }
@@ -393,7 +387,6 @@ void ip_vs_protocol_cleanup(void)
 	struct ip_vs_protocol *pp;
 	int i;
 
-	unregister_pernet_subsys(&ipvs_proto_ops);
 	/* unregister all the ipvs protocols */
 	for (i = 0; i < IP_VS_PROTO_TAB_SIZE; i++) {
 		while ((pp = ip_vs_proto_table[i]) != NULL)
diff --git a/net/netfilter/ipvs/ip_vs_sync.c b/net/netfilter/ipvs/ip_vs_sync.c
index 0cce953..e292e5b 100644
--- a/net/netfilter/ipvs/ip_vs_sync.c
+++ b/net/netfilter/ipvs/ip_vs_sync.c
@@ -1663,7 +1663,7 @@ int stop_sync_thread(struct net *net, int state)
 /*
  * Initialize data struct for each netns
  */
-static int __net_init __ip_vs_sync_init(struct net *net)
+int __net_init __ip_vs_sync_init(struct net *net)
 {
 	struct netns_ipvs *ipvs = net_ipvs(net);
 
@@ -1677,7 +1677,7 @@ static int __net_init __ip_vs_sync_init(struct net *net)
 	return 0;
 }
 
-static void __ip_vs_sync_cleanup(struct net *net)
+void __ip_vs_sync_cleanup(struct net *net)
 {
 	int retc;
 
@@ -1690,18 +1690,11 @@ static void __ip_vs_sync_cleanup(struct net *net)
 		pr_err("Failed to stop Backup Daemon\n");
 }
 
-static struct pernet_operations ipvs_sync_ops = {
-	.init = __ip_vs_sync_init,
-	.exit = __ip_vs_sync_cleanup,
-};
-

^ permalink raw reply related

* [PATCH 1/2] IPVS: Change of socket usage to enable name space exit.
From: Simon Horman @ 2011-05-08 23:17 UTC (permalink / raw)
  To: lvs-devel, netdev, netfilter-devel, netfilter
  Cc: Wensong Zhang, Julian Anastasov, Patrick McHardy,
	Hans Schillstrom, Hans Schillstrom, Eric W. Biederman,
	Simon Horman
In-Reply-To: <1304896671-29384-1-git-send-email-horms@verge.net.au>

From: Hans Schillstrom <hans@schillstrom.com>

If the sync daemons run in a name space while it crashes
or get killed, there is no way to stop them except for a reboot.
When all patches are there, ip_vs_core will handle register_pernet_(),
i.e. ip_vs_sync_init() and ip_vs_sync_cleanup() will be removed.

Kernel threads should not increment the use count of a socket.
By calling sk_change_net() after creating a socket this is avoided.
sock_release cant be used intead sk_release_kernel() should be used.

Thanks Eric W Biederman for your advices.

Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
[horms@verge.net.au: minor edit to changelog]
Signed-off-by: Simon Horman <horms@verge.net.au>
---
 net/netfilter/ipvs/ip_vs_core.c |    2 +-
 net/netfilter/ipvs/ip_vs_sync.c |   58 +++++++++++++++++++++++++--------------
 2 files changed, 38 insertions(+), 22 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
index 07accf6..a0791dc 100644
--- a/net/netfilter/ipvs/ip_vs_core.c
+++ b/net/netfilter/ipvs/ip_vs_core.c
@@ -1896,7 +1896,7 @@ static int __net_init __ip_vs_init(struct net *net)
 
 static void __net_exit __ip_vs_cleanup(struct net *net)
 {
-	IP_VS_DBG(10, "ipvs netns %d released\n", net_ipvs(net)->gen);
+	IP_VS_DBG(2, "ipvs netns %d released\n", net_ipvs(net)->gen);
 }
 
 static struct pernet_operations ipvs_core_ops = {
diff --git a/net/netfilter/ipvs/ip_vs_sync.c b/net/netfilter/ipvs/ip_vs_sync.c
index 3e7961e..0cce953 100644
--- a/net/netfilter/ipvs/ip_vs_sync.c
+++ b/net/netfilter/ipvs/ip_vs_sync.c
@@ -1303,13 +1303,18 @@ static struct socket *make_send_sock(struct net *net)
 	struct socket *sock;
 	int result;
 
-	/* First create a socket */
-	result = __sock_create(net, PF_INET, SOCK_DGRAM, IPPROTO_UDP, &sock, 1);
+	/* First create a socket move it to right name space later */
+	result = sock_create_kern(PF_INET, SOCK_DGRAM, IPPROTO_UDP, &sock);
 	if (result < 0) {
 		pr_err("Error during creation of socket; terminating\n");
 		return ERR_PTR(result);
 	}
-
+	/*
+	 * Kernel sockets that are a part of a namespace, should not
+	 * hold a reference to a namespace in order to allow to stop it.
+	 * After sk_change_net should be released using sk_release_kernel.
+	 */
+	sk_change_net(sock->sk, net);
 	result = set_mcast_if(sock->sk, ipvs->master_mcast_ifn);
 	if (result < 0) {
 		pr_err("Error setting outbound mcast interface\n");
@@ -1334,8 +1339,8 @@ static struct socket *make_send_sock(struct net *net)
 
 	return sock;
 
-  error:
-	sock_release(sock);
+error:
+	sk_release_kernel(sock->sk);
 	return ERR_PTR(result);
 }
 
@@ -1350,12 +1355,17 @@ static struct socket *make_receive_sock(struct net *net)
 	int result;
 
 	/* First create a socket */
-	result = __sock_create(net, PF_INET, SOCK_DGRAM, IPPROTO_UDP, &sock, 1);
+	result = sock_create_kern(PF_INET, SOCK_DGRAM, IPPROTO_UDP, &sock);
 	if (result < 0) {
 		pr_err("Error during creation of socket; terminating\n");
 		return ERR_PTR(result);
 	}
-
+	/*
+	 * Kernel sockets that are a part of a namespace, should not
+	 * hold a reference to a namespace in order to allow to stop it.
+	 * After sk_change_net should be released using sk_release_kernel.
+	 */
+	sk_change_net(sock->sk, net);
 	/* it is equivalent to the REUSEADDR option in user-space */
 	sock->sk->sk_reuse = 1;
 
@@ -1377,8 +1387,8 @@ static struct socket *make_receive_sock(struct net *net)
 
 	return sock;
 
-  error:
-	sock_release(sock);
+error:
+	sk_release_kernel(sock->sk);
 	return ERR_PTR(result);
 }
 
@@ -1473,7 +1483,7 @@ static int sync_thread_master(void *data)
 		ip_vs_sync_buff_release(sb);
 
 	/* release the sending multicast socket */
-	sock_release(tinfo->sock);
+	sk_release_kernel(tinfo->sock->sk);
 	kfree(tinfo);
 
 	return 0;
@@ -1513,7 +1523,7 @@ static int sync_thread_backup(void *data)
 	}
 
 	/* release the sending multicast socket */
-	sock_release(tinfo->sock);
+	sk_release_kernel(tinfo->sock->sk);
 	kfree(tinfo->buf);
 	kfree(tinfo);
 
@@ -1601,7 +1611,7 @@ outtinfo:
 outbuf:
 	kfree(buf);
 outsocket:
-	sock_release(sock);
+	sk_release_kernel(sock->sk);
 out:
 	return result;
 }
@@ -1610,6 +1620,7 @@ out:
 int stop_sync_thread(struct net *net, int state)
 {
 	struct netns_ipvs *ipvs = net_ipvs(net);
+	int retc = -EINVAL;
 
 	IP_VS_DBG(7, "%s(): pid %d\n", __func__, task_pid_nr(current));
 
@@ -1629,7 +1640,7 @@ int stop_sync_thread(struct net *net, int state)
 		spin_lock_bh(&ipvs->sync_lock);
 		ipvs->sync_state &= ~IP_VS_STATE_MASTER;
 		spin_unlock_bh(&ipvs->sync_lock);
-		kthread_stop(ipvs->master_thread);
+		retc = kthread_stop(ipvs->master_thread);
 		ipvs->master_thread = NULL;
 	} else if (state == IP_VS_STATE_BACKUP) {
 		if (!ipvs->backup_thread)
@@ -1639,16 +1650,14 @@ int stop_sync_thread(struct net *net, int state)
 			task_pid_nr(ipvs->backup_thread));
 
 		ipvs->sync_state &= ~IP_VS_STATE_BACKUP;
-		kthread_stop(ipvs->backup_thread);
+		retc = kthread_stop(ipvs->backup_thread);
 		ipvs->backup_thread = NULL;
-	} else {
-		return -EINVAL;
 	}
 
 	/* decrease the module use count */
 	ip_vs_use_count_dec();
 
-	return 0;
+	return retc;
 }
 
 /*
@@ -1670,8 +1679,15 @@ static int __net_init __ip_vs_sync_init(struct net *net)
 
 static void __ip_vs_sync_cleanup(struct net *net)
 {
-	stop_sync_thread(net, IP_VS_STATE_MASTER);
-	stop_sync_thread(net, IP_VS_STATE_BACKUP);
+	int retc;
+
+	retc = stop_sync_thread(net, IP_VS_STATE_MASTER);
+	if (retc && retc != -ESRCH)
+		pr_err("Failed to stop Master Daemon\n");
+
+	retc = stop_sync_thread(net, IP_VS_STATE_BACKUP);
+	if (retc && retc != -ESRCH)
+		pr_err("Failed to stop Backup Daemon\n");
 }
 
 static struct pernet_operations ipvs_sync_ops = {
@@ -1682,10 +1698,10 @@ static struct pernet_operations ipvs_sync_ops = {
 
 int __init ip_vs_sync_init(void)
 {
-	return register_pernet_subsys(&ipvs_sync_ops);
+	return register_pernet_device(&ipvs_sync_ops);
 }
 
 void ip_vs_sync_cleanup(void)
 {
-	unregister_pernet_subsys(&ipvs_sync_ops);
+	unregister_pernet_device(&ipvs_sync_ops);
 }
-- 
1.7.4.4


^ permalink raw reply related

* [GIT PULL nf-2.6] IPVS (take III)
From: Simon Horman @ 2011-05-08 23:17 UTC (permalink / raw)
  To: lvs-devel, netdev, netfilter-devel, netfilter
  Cc: Wensong Zhang, Julian Anastasov, Patrick McHardy,
	Hans Schillstrom, Hans Schillstrom, Eric W. Biederman

please consider pulling

  git://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-2.6.git for-nf-2.6

to get the following fix from Hans. They resolve some problems related
to his netns for IPVS work which was incorporated into 2.6.39-rc1.

The pull request is based on nf-2.6/master.

There are other less-pressing changes from Hans which
I plan to get you to pull into nf-next-2.6 once these
changes make it there (presumably via net-2.6 and then net-next-2.6).

There are minor changes since take II.

Hans Schillstrom (2):
      IPVS: Change of socket usage to enable name space exit.
      IPVS: init and cleanup restructuring

 include/net/ip_vs.h              |   17 +++++
 net/netfilter/ipvs/ip_vs_app.c   |   15 +----
 net/netfilter/ipvs/ip_vs_conn.c  |   12 +---
 net/netfilter/ipvs/ip_vs_core.c  |  103 +++++++++++++++++++++++++++++---
 net/netfilter/ipvs/ip_vs_ctl.c   |  120 ++++++++++++++++++++++++++++++++------
 net/netfilter/ipvs/ip_vs_est.c   |   14 +----
 net/netfilter/ipvs/ip_vs_proto.c |   11 +---
 net/netfilter/ipvs/ip_vs_sync.c  |   65 ++++++++++++---------
 8 files changed, 258 insertions(+), 99 deletions(-)

^ permalink raw reply

* Re: [PATCH] net/bonding: adjust codingstyle for bond_3ad files
From: David Miller @ 2011-05-08 23:10 UTC (permalink / raw)
  To: hlovdal
  Cc: mfmooney, joe, aquini, kernel-janitors, fubar, andy, shemminger,
	netdev, nikai
In-Reply-To: <BANLkTik_gMCHYp4sOnzzR1Gs=bn2pkpjfQ@mail.gmail.com>

From: Håkon Løvdal <hlovdal@gmail.com>
Date: Mon, 9 May 2011 01:08:44 +0200

> On 7 May 2011 21:35, matt mooney <mfmooney@gmail.com> wrote:
>> But isn't the preferred style to have a single exit point?
> 
> This is generally considered to be a bad advice, see
> http://stackoverflow.com/questions/1701686/why-should-methods-have-a-single-entry-and-exit-points/1701721#1701721
> for instance.

That article totally ignores the issue of locking and how hard it is
to get right without single exit points, and how unlocking in
multiple spots bloats up the code.

Definitely don't take that article's advice when working on the
kernel.

^ permalink raw reply

* Re: [PATCH] net/bonding: adjust codingstyle for bond_3ad files
From: Håkon Løvdal @ 2011-05-08 23:08 UTC (permalink / raw)
  To: matt mooney
  Cc: Joe Perches, aquini, kernel-janitors, David Miller, Jay Vosburgh,
	Andy Gospodarek, shemminger, netdev, Nicolas Kaiser
In-Reply-To: <BANLkTimKs39m_htMFEygR+3Bfu7Butc-zQ@mail.gmail.com>

On 7 May 2011 21:35, matt mooney <mfmooney@gmail.com> wrote:
> But isn't the preferred style to have a single exit point?

This is generally considered to be a bad advice, see
http://stackoverflow.com/questions/1701686/why-should-methods-have-a-single-entry-and-exit-points/1701721#1701721
for instance.

BR Håkon Løvdal

^ permalink raw reply

* Re: [PATCHv2 1/2] net: Allow ethtool to set interface in loopback mode.
From: David Miller @ 2011-05-08 23:00 UTC (permalink / raw)
  To: bhutchings; +Cc: maheshb, netdev, mirq-linux, therbert
In-Reply-To: <1304560369.3203.44.camel@localhost>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Thu, 05 May 2011 02:52:48 +0100

> On Wed, 2011-05-04 at 18:30 -0700, Mahesh Bandewar wrote:
>> This patch enables ethtool to set the loopback mode on a given interface.
>> By configuring the interface in loopback mode in conjunction with a policy
>> route / rule, a userland application can stress the egress / ingress path
>> exposing the flows of the change in progress and potentially help developer(s)
>> understand the impact of those changes without even sending a packet out
>> on the network.
>> 
>> Following set of commands illustrates one such example -
>>     a) ip -4 addr add 192.168.1.1/24 dev eth1
>>     b) ip -4 rule add from all iif eth1 lookup 250
>>     c) ip -4 route add local 0/0 dev lo proto kernel scope host table 250
>>     d) arp -Ds 192.168.1.100 eth1
>>     e) arp -Ds 192.168.1.200 eth1
>>     f) sysctl -w net.ipv4.ip_nonlocal_bind=1
>>     g) sysctl -w net.ipv4.conf.all.accept_local=1
>>     # Assuming that the machine has 8 cores
>>     h) taskset 000f netserver -L 192.168.1.200
>>     i) taskset 00f0 netperf -t TCP_CRR -L 192.168.1.100 -H 192.168.1.200 -l 30
>> 
>> Signed-off-by: Mahesh Bandewar <maheshb@google.com>
> Acked-by: Ben Hutchings <bhutchings@solarflare.com>

Applied.

^ permalink raw reply

* Re: [PATCHv2] forcedeth: Allow ethtool to enable/disable loopback.
From: David Miller @ 2011-05-08 22:57 UTC (permalink / raw)
  To: maheshb; +Cc: netdev, therbert
In-Reply-To: <1304559400-16257-1-git-send-email-maheshb@google.com>

From: Mahesh Bandewar <maheshb@google.com>
Date: Wed,  4 May 2011 18:36:40 -0700

> @@ -4502,6 +4549,9 @@ static int nv_set_features(struct net_device *dev, u32 features)
>  
>  		spin_unlock_irq(&np->lock);
>  	}
> +	if ((changed & NETIF_F_LOOPBACK) && netif_running(dev)) {
> +		nv_set_loopback(dev, features);
> +	}
>  

Please do not create a braced basic block just for a single
line of code.  Simply:

> +	if ((changed & NETIF_F_LOOPBACK) && netif_running(dev))
> +		nv_set_loopback(dev, features);

is sufficient.

^ permalink raw reply

* Re: [PATCH] tcp_cubic: limit delayed_ack ratio to prevent divide error
From: David Miller @ 2011-05-08 22:52 UTC (permalink / raw)
  To: shemminger
  Cc: sangtae.ha, injongrhee, Valdis.Kletnieks, rdunlap, lkml, netdev,
	linux-kernel
In-Reply-To: <20110504130456.425dee68@nehalam>

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Wed, 4 May 2011 13:04:56 -0700

> TCP Cubic keeps a metric that estimates the amount of delayed
> acknowledgements to use in adjusting the window. If an abnormally
> large number of packets are acknowledged at once, then the update
> could wrap and reach zero. This kind of ACK could only
> happen when there was a large window and huge number of
> ACK's were lost.
> 
> This patch limits the value of delayed ack ratio. The choice of 32
> is just a conservative value since normally it should be range of 
> 1 to 4 packets.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

Applied, thanks Stephen.

^ permalink raw reply

* Re: [PATCH] pktgen: use %pI6c for printing IPv6 addresses
From: David Miller @ 2011-05-08 22:50 UTC (permalink / raw)
  To: joe; +Cc: adobriyan, netdev
In-Reply-To: <1304460714.1788.40.camel@Joe-Laptop>

From: Joe Perches <joe@perches.com>
Date: Tue, 03 May 2011 15:11:54 -0700

> On Wed, 2011-05-04 at 00:23 +0300, Alexey Dobriyan wrote:
>> I don't know why %pI6 doesn't compress, but the format specifier is
>> kernel-standard, so use it.
> 
> Because seq_printf output using %pI6 has a known
> output style and shouldn't be changed to avoid
> breaking user applications.

I've applied Alexey's patch to net-next-2.6

^ permalink raw reply

* Re: [PATCH net-next-2.6] ipheth: Properly distinguish length and alignment in URBs and skbs
From: David Miller @ 2011-05-08 22:46 UTC (permalink / raw)
  To: bhutchings
  Cc: david.hill, agimenez, linux-kernel, dgiagio, dborca, pmcenery,
	linux-usb, netdev
In-Reply-To: <1304444965.2873.11.camel@bwh-desktop>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Tue, 03 May 2011 18:49:25 +0100

> The USB protocol this driver implements appears to require 2 bytes of
> padding in front of each received packet.  This used to be equal to
> the value of NET_IP_ALIGN on x86, so the driver abused that constant
> and mostly worked, but this is no longer the case.  The driver also
> mixed up the URB and packet lengths, resulting in 2 bytes of junk at
> the end of the skb.
> 
> Introduce a private constant for the 2 bytes of padding; fix this
> confusion and check for the under-length case.
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
> ---
> Compile-tested only, as I'm not cool enough for an iPhone either.
> This is applicable to net-next-2.6 or v2.6.38.

I've applied this to net-2.6 and will conditionally queue it up for
-stable, if we need further fixups we can add relative patches.

Thanks.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox