Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 1/4] e1000e: save skb counts in TX to avoid cache misses
From: Alexander Duyck @ 2010-04-26 15:55 UTC (permalink / raw)
  To: Tom Herbert; +Cc: davem@davemloft.net, netdev@vger.kernel.org
In-Reply-To: <alpine.DEB.1.00.1004231719050.16806@pokey.mtv.corp.google.com>

Tom Herbert wrote:
> In e1000_tx_map, precompute number of segements and bytecounts which
> are derived from fields in skb; these are stored in buffer_info.  When
> cleaning tx in e1000_clean_tx_irq use the values in the associated
> buffer_info for statistics counting, this eliminates cache misses
> on skb fields.
> 
> Signed-off-by: Tom Herbert <therbert@google.com>
> ---
> diff --git a/drivers/net/e1000e/e1000.h b/drivers/net/e1000e/e1000.h
> index 12648a1..d6da75b 100644
> --- a/drivers/net/e1000e/e1000.h
> +++ b/drivers/net/e1000e/e1000.h
> @@ -188,6 +188,8 @@ struct e1000_buffer {
>  			unsigned long time_stamp;
>  			u16 length;
>  			u16 next_to_watch;
> +			unsigned int segs;
> +			unsigned int bytecount;
>  			u16 mapped_as_page;
>  		};
>  		/* Rx */

Does segs need to be a full int here?  I think you can probably get away 
with either an unsigned short or just define it as a u16, then combine 
it with mapped_as_page to cut down on the overall size.

Thanks,

Alex

^ permalink raw reply

* [PATCH] cxgb3: Wait longer for control packets on initialization
From: Andre Detsch @ 2010-04-26 15:38 UTC (permalink / raw)
  To: netdev, divy


In some Power7 platforms, when using VIOS (Virtual I/O Server), we
need to wait longer for control packets to finish transfer during
initialization.
Without this change, initialization may fail prematurely.

Signed-off-by: Wen Xiong <wenxiong@us.ibm.com>
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>

Index: linux-2.6.34-rc5/drivers/net/cxgb3/cxgb3_main.c
===================================================================
--- linux-2.6.34-rc5.orig/drivers/net/cxgb3/cxgb3_main.c	2010-04-23 18:59:43.000000000 -0300
+++ linux-2.6.34-rc5/drivers/net/cxgb3/cxgb3_main.c	2010-04-23 18:59:55.000000000 -0300
@@ -439,7 +439,7 @@ static void free_irq_resources(struct ad
 static int await_mgmt_replies(struct adapter *adap, unsigned long init_cnt,
 			      unsigned long n)
 {
-	int attempts = 5;
+	int attempts = 10;

 	while (adap->sge.qs[0].rspq.offload_pkts < init_cnt + n) {
 		if (!--attempts)

^ permalink raw reply

* Re: [patch v2] bluetooth: handle l2cap_create_connless_pdu() errors
From: Gustavo F. Padovan @ 2010-04-26 15:09 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Marcel Holtmann, David S. Miller, Andrei Emeltchenko,
	linux-bluetooth, netdev, kernel-janitors
In-Reply-To: <20100426113627.GS29093@bicker>

Hi Dan,

* Dan Carpenter <error27@gmail.com> [2010-04-26 13:36:27 +0200]:

> l2cap_create_connless_pdu() can sometimes return ERR_PTR(-ENOMEM) or
> ERR_PTR(-EFAULT).
> 
> Signed-off-by: Dan Carpenter <error27@gmail.com>
> ---
> In v2 I wrote the patch on top of Gustavo Padovon's devel tree
> 
> diff --git a/net/bluetooth/l2cap.c b/net/bluetooth/l2cap.c
> index a668ef4..b32682c 100644
> --- a/net/bluetooth/l2cap.c
> +++ b/net/bluetooth/l2cap.c
> @@ -1712,8 +1712,12 @@ static int l2cap_sock_sendmsg(struct kiocb *iocb, struct socket *sock, struct ms
>  	/* Connectionless channel */
>  	if (sk->sk_type == SOCK_DGRAM) {
>  		skb = l2cap_create_connless_pdu(sk, msg, len);
> -		l2cap_do_send(sk, skb);
> -		err = len;
> +		if (IS_ERR(skb)) {
> +			err = PTR_ERR(skb);

Remove the braces on the 'if', then we are fine to pick your patch ;)

> +		} else {
> +			l2cap_do_send(sk, skb);
> +			err = len;
> +		}
>  		goto done;
>  	}
>  


Regards,

-- 
Gustavo F. Padovan
http://padovan.org

^ permalink raw reply

* Re: [PATCH v6] net: batch skb dequeueing from softnet input_pkt_queue
From: Eric Dumazet @ 2010-04-26 14:55 UTC (permalink / raw)
  To: hadi
  Cc: Changli Gao, David S. Miller, Tom Herbert, Stephen Hemminger,
	netdev, Andi Kleen
In-Reply-To: <1272290584.19143.43.camel@edumazet-laptop>

Le lundi 26 avril 2010 à 16:03 +0200, Eric Dumazet a écrit :
> Le samedi 24 avril 2010 à 10:10 -0400, jamal a écrit :
> > On Fri, 2010-04-23 at 18:02 -0400, jamal wrote:
> > 
> > > Ive done a setup with the last patch from Changli + net-next - I will
> > > post test results tomorrow AM.
> > 
> > ok, annotated results attached. 
> > 
> > cheers,
> > jamal
> 
> Jamal, I have a Nehalem setup now, and I can see
> _raw_spin_lock_irqsave() abuse is not coming from network tree, but from
> clockevents_notify()
> 

Another interesting finding:

- if all packets are received on a single queue, max speed seems to be
1.200.000 packets per second on my machine :-(

And on profile of receiving cpu (RPS enabled, pakets sent to 15 other
cpus), we can see default_send_IPI_mask_sequence_phys() is the slow
thing...

Andi, what do you think of this one ?
Dont we have a function to send an IPI to an individual cpu instead ?

void default_send_IPI_mask_sequence_phys(const struct cpumask *mask, int
vector)
{
        unsigned long query_cpu;
        unsigned long flags;

        /*
         * Hack. The clustered APIC addressing mode doesn't allow us to
send
         * to an arbitrary mask, so I do a unicast to each CPU instead.
         * - mbligh
         */
        local_irq_save(flags);
        for_each_cpu(query_cpu, mask) {
                __default_send_IPI_dest_field(per_cpu(x86_cpu_to_apicid,
                                query_cpu), vector, APIC_DEST_PHYSICAL);
        }
        local_irq_restore(flags);
}


-----------------------------------------------------------------------------------------------------------------------------------------
   PerfTop:    1000 irqs/sec  kernel:100.0% [1000Hz cycles],  (all, cpu:
7)
-----------------------------------------------------------------------------------------------------------------------------------------

             samples  pcnt function                            DSO
             _______ _____ ___________________________________ _______

              668.00 17.7% default_send_IPI_mask_sequence_phys vmlinux
              363.00  9.6% bnx2x_rx_int                        vmlinux
              354.00  9.4% eth_type_trans                      vmlinux
              332.00  8.8% kmem_cache_alloc_node               vmlinux
              285.00  7.6% __kmalloc_node_track_caller         vmlinux
              278.00  7.4% _raw_spin_lock                      vmlinux
              166.00  4.4% __slab_alloc                        vmlinux
              147.00  3.9% __memset                            vmlinux
              136.00  3.6% list_del                            vmlinux
              132.00  3.5% get_partial_node                    vmlinux
              131.00  3.5% get_rps_cpu                         vmlinux
              102.00  2.7% enqueue_to_backlog                  vmlinux
               95.00  2.5% unmap_single                        vmlinux
               94.00  2.5% __alloc_skb                         vmlinux
               74.00  2.0% vlan_gro_common                     vmlinux
               52.00  1.4% __phys_addr                         vmlinux
               48.00  1.3% dev_gro_receive                     vmlinux
               39.00  1.0% swiotlb_dma_mapping_error           vmlinux
               36.00  1.0% swiotlb_map_page                    vmlinux
               34.00  0.9% skb_put                             vmlinux
               27.00  0.7% is_swiotlb_buffer                   vmlinux
               23.00  0.6% deactivate_slab                     vmlinux
               20.00  0.5% vlan_gro_receive                    vmlinux
               17.00  0.5% __skb_bond_should_drop              vmlinux
               14.00  0.4% netif_receive_skb                   vmlinux
               14.00  0.4% __netdev_alloc_skb                  vmlinux
               12.00  0.3% skb_gro_reset_offset                vmlinux
               12.00  0.3% get_slab                            vmlinux
               11.00  0.3% napi_skb_finish                     vmlinux



^ permalink raw reply

* [PATCH] ieee802154: Fix oops during ieee802154_sock_ioctl
From: Dmitry Eremin-Solenikov @ 2010-04-26 14:46 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, linux-kernel, Stefan Schmidt

From: Stefan Schmidt <stefan@datenfreihafen.org>

Trying to run izlisten (from lowpan-tools tests) on a device that does not
exists I got the oops below. The problem is that we are using get_dev_by_name
without checking if we really get a device back. We don't in this case and
writing to dev->type generates this oops.

[Oops code removed by Dmitry Eremin-Solenikov]

If possible this patch should be applied to the current -rc fixes branch.

Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
---
 net/ieee802154/af_ieee802154.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/net/ieee802154/af_ieee802154.c b/net/ieee802154/af_ieee802154.c
index bad1c49..72340dd 100644
--- a/net/ieee802154/af_ieee802154.c
+++ b/net/ieee802154/af_ieee802154.c
@@ -147,6 +147,9 @@ static int ieee802154_dev_ioctl(struct sock *sk, struct ifreq __user *arg,
 	dev_load(sock_net(sk), ifr.ifr_name);
 	dev = dev_get_by_name(sock_net(sk), ifr.ifr_name);
 
+	if (!dev)
+		return -ENODEV;
+
 	if (dev->type == ARPHRD_IEEE802154 && dev->netdev_ops->ndo_do_ioctl)
 		ret = dev->netdev_ops->ndo_do_ioctl(dev, &ifr, cmd);
 
-- 
1.7.0

^ permalink raw reply related

* Re: [PATCH] e100: expose broadcast_disabled as a module option
From: Erwan Velu @ 2010-04-26 14:45 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Jeff Kirsher, netdev, David Miller, linux-kernel,
	jesse.brandeburg, bruce.w.allan, alexander.h.duyck,
	peter.p.waskiewicz.jr, john.ronciak
In-Reply-To: <y2lb43bf5491004260149x23beb95dqc9e177d9a6ecc679@mail.gmail.com>

2010/4/26 Erwan Velu <erwanaliasr1@gmail.com>:

After some research, here come my status :

> Does any one you could help me understanding what should be the good way to
> 1- enabled/disabled IFF_BROADCAST for a given interface

Looks like the current code isn't taking care of a possible changing
of IFF_BROADCAST. I can find some code in net/core/dev.c to handle
MULTICAST or PROMISC but nothing for BROADCAST.
do I have to implement one ?

> 2- populate this changes to the driver

Looks like I need then to add a .ndo_change_rx_flags function to
manage this changes.

Does someone confirm this is the way to go ?

^ permalink raw reply

* Re: DDoS attack causing bad effect on conntrack searches
From: Jesper Dangaard Brouer @ 2010-04-26 14:36 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jesper Dangaard Brouer, paulmck, Patrick McHardy, Changli Gao,
	Linux Kernel Network Hackers, Netfilter Developers
In-Reply-To: <1272139861.20714.525.camel@edumazet-laptop>

On Sat, 2010-04-24 at 22:11 +0200, Eric Dumazet wrote:
>  
> > Monday or Tuesdag I'll do a test setup with some old HP380 G4 machines to 
> > see if I can reproduce the DDoS attack senario.  And see if I can get 
> > it into to lookup loop.
> 
> Theorically a loop is very unlikely, given a single retry is very
> unlikly too.
> 
> Unless a cpu gets in its cache a corrupted value of a 'next' pointer.
> 
...
>
> With same hash bucket size (300.032) and max conntracks (800.000), and
> after more than 10 hours of test, not a single lookup was restarted
> because of a nulls with wrong value.

So fare, I have to agree with you.  I have now tested on the same type
of hardware (although running a 64-bit kernel, and off net-next-2.6),
and the result is, the same as yours, I don't see a any restarts of the
loop.  The test systems differs a bit, as it has two physical CPU and 2M
cache (and annoyingly the system insists on using HPET as clocksource).

Guess, the only explanation would be bad/sub-optimal hash distribution.
With 40 kpps and 700.000 'searches' per second, the hash bucket list
length "only" need to be 17.5 elements on average, where optimum is 3.
With my pktgen DoS test, where I tried to reproduce the DoS attack, only
see a screw of 6 elements on average.

> I can setup a test on a 16 cpu machine, multiqueue card too.

Don't think that is necessary.  My theory was it was possible on slower
single queue NIC, where one CPU is 100% busy in the conntrack search,
and the other CPUs delete the entries (due to early drop and
call_rcu()).  But guess that note the case, and RCU works perfectly ;-)

> Hmm, I forgot to say I am using net-next-2.6, not your kernel version...

I also did this test using net-next-2.6, perhaps I should try the
version I use in production...

-- 
Med venlig hilsen / Best regards
  Jesper Brouer
  ComX Networks A/S
  Linux Network Kernel Developer
  Cand. Scient Datalog / MSc.CS
  Author of http://adsl-optimizer.dk
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [PATCH v6] net: batch skb dequeueing from softnet input_pkt_queue
From: Eric Dumazet @ 2010-04-26 14:03 UTC (permalink / raw)
  To: hadi; +Cc: Changli Gao, David S. Miller, Tom Herbert, Stephen Hemminger,
	netdev
In-Reply-To: <1272118252.8918.13.camel@bigi>

Le samedi 24 avril 2010 à 10:10 -0400, jamal a écrit :
> On Fri, 2010-04-23 at 18:02 -0400, jamal wrote:
> 
> > Ive done a setup with the last patch from Changli + net-next - I will
> > post test results tomorrow AM.
> 
> ok, annotated results attached. 
> 
> cheers,
> jamal

Jamal, I have a Nehalem setup now, and I can see
_raw_spin_lock_irqsave() abuse is not coming from network tree, but from
clockevents_notify()

My pktgen sends 1040989pps :

# Samples: 389707198131
#
# Overhead         Command                 Shared Object  Symbol
# ........  ..............  ............................  ......
#
    23.52%            init  [kernel.kallsyms]             [k] _raw_spin_lock_irqsave
                      |
                      --- _raw_spin_lock_irqsave
                         |          
                         |--94.74%-- clockevents_notify
                         |          lapic_timer_state_broadcast
                         |          acpi_idle_enter_bm
                         |          cpuidle_idle_call
                         |          cpu_idle
                         |          start_secondary
                         |          
                         |--4.10%-- tick_broadcast_oneshot_control
                         |          tick_notify
                         |          notifier_call_chain
                         |          __raw_notifier_call_chain
                         |          raw_notifier_call_chain
                         |          clockevents_do_notify
                         |          clockevents_notify
                         |          lapic_timer_state_broadcast
                         |          acpi_idle_enter_bm
                         |          cpuidle_idle_call
                         |          cpu_idle
                         |          start_secondary
                         |          
                         |--0.58%-- lapic_timer_state_broadcast
                         |          acpi_idle_enter_bm
                         |          cpuidle_idle_call
                         |          cpu_idle
                         |          start_secondary
                          --0.58%-- [...]

     8.94%            init  [kernel.kallsyms]             [k] acpi_os_read_port
                      |
                      --- acpi_os_read_port
                         |          
                         |--99.55%-- acpi_hw_read_port
                         |          acpi_hw_read
                         |          acpi_hw_read_multiple
                         |          acpi_hw_register_read
                         |          acpi_read_bit_register



# Samples: 389233082962
#
# Overhead         Command                 Shared Object  Symbol
# ........  ..............  ............................  ......
#
    23.25%            init  [kernel.kallsyms]             [k] _raw_spin_lock_irqsave
     8.90%            init  [kernel.kallsyms]             [k] acpi_os_read_port
     2.93%            init  [kernel.kallsyms]             [k] mwait_idle_with_hints
     1.99%            init  [kernel.kallsyms]             [k] schedule
     1.94%         udpsink  [kernel.kallsyms]             [k] schedule
     1.73%         swapper  [kernel.kallsyms]             [k] _raw_spin_lock_irqsave
     1.48%            init  [kernel.kallsyms]             [k] bnx2x_rx_int
     1.47%            init  [kernel.kallsyms]             [k] _raw_spin_unlock_irqrestore
     1.44%            init  [kernel.kallsyms]             [k] _raw_spin_lock
     1.36%         udpsink  [kernel.kallsyms]             [k] udp_recvmsg
     1.05%         udpsink  [kernel.kallsyms]             [k] __skb_recv_datagram
     1.05%            init  [kernel.kallsyms]             [k] __udp4_lib_lookup
     1.04%         udpsink  [kernel.kallsyms]             [k] copy_user_generic_string
     1.04%         udpsink  [kernel.kallsyms]             [k] __slab_free
     0.99%            init  [kernel.kallsyms]             [k] select_task_rq_fair
     0.99%            init  [kernel.kallsyms]             [k] try_to_wake_up
     0.98%            init  [kernel.kallsyms]             [k] task_rq_lock
     0.93%            init  [kernel.kallsyms]             [k] tick_broadcast_oneshot_control
     0.89%            init  [kernel.kallsyms]             [k] sock_queue_rcv_skb
     0.89%         udpsink  [kernel.kallsyms]             [k] sock_recv_ts_and_drops
     0.88%         udpsink  [kernel.kallsyms]             [k] kfree
     0.79%         swapper  [kernel.kallsyms]             [k] acpi_os_read_port
     0.76%         udpsink  [kernel.kallsyms]             [k] _raw_spin_lock_irqsave
     0.73%         udpsink  [kernel.kallsyms]             [k] inet_recvmsg
     0.71%         udpsink  [vdso]                        [.] 0x000000ffffe431
     0.65%         udpsink  [kernel.kallsyms]             [k] sock_recvmsg
     0.62%            init  [kernel.kallsyms]             [k] gs_change
     0.61%            init  [kernel.kallsyms]             [k] enqueue_task_fair
     0.61%            init  [kernel.kallsyms]             [k] eth_type_trans
     0.61%            init  [kernel.kallsyms]             [k] sock_def_readable
     0.60%         udpsink  [kernel.kallsyms]             [k] _raw_spin_lock_bh
     0.59%            init  [kernel.kallsyms]             [k] ip_route_input
     0.59%         udpsink  libpthread-2.3.4.so           [.] __pthread_disable_asynccancel
     0.56%            init  [kernel.kallsyms]             [k] bnx2x_poll
     0.56%         udpsink  [kernel.kallsyms]             [k] __get_user_4



^ permalink raw reply

* [PATCH net-next-2.6] pppoe: use phonet_pernet instead of directly net_generic
From: Jiri Pirko @ 2010-04-26 13:41 UTC (permalink / raw)
  To: netdev; +Cc: davem

As in for example pppoe introduce phonet_pernet and use it instead of calling
net_generic directly.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>

diff --git a/net/phonet/pn_dev.c b/net/phonet/pn_dev.c
index 9b4ced6..c33da65 100644
--- a/net/phonet/pn_dev.c
+++ b/net/phonet/pn_dev.c
@@ -46,9 +46,16 @@ struct phonet_net {
 
 int phonet_net_id __read_mostly;
 
+static struct phonet_net *phonet_pernet(struct net *net)
+{
+	BUG_ON(!net);
+
+	return net_generic(net, phonet_net_id);
+}
+
 struct phonet_device_list *phonet_device_list(struct net *net)
 {
-	struct phonet_net *pnn = net_generic(net, phonet_net_id);
+	struct phonet_net *pnn = phonet_pernet(net);
 	return &pnn->pndevs;
 }
 
@@ -261,7 +268,7 @@ static int phonet_device_autoconf(struct net_device *dev)
 
 static void phonet_route_autodel(struct net_device *dev)
 {
-	struct phonet_net *pnn = net_generic(dev_net(dev), phonet_net_id);
+	struct phonet_net *pnn = phonet_pernet(dev_net(dev));
 	unsigned i;
 	DECLARE_BITMAP(deleted, 64);
 
@@ -313,7 +320,7 @@ static struct notifier_block phonet_device_notifier = {
 /* Per-namespace Phonet devices handling */
 static int __net_init phonet_init_net(struct net *net)
 {
-	struct phonet_net *pnn = net_generic(net, phonet_net_id);
+	struct phonet_net *pnn = phonet_pernet(net);
 
 	if (!proc_net_fops_create(net, "phonet", 0, &pn_sock_seq_fops))
 		return -ENOMEM;
@@ -326,7 +333,7 @@ static int __net_init phonet_init_net(struct net *net)
 
 static void __net_exit phonet_exit_net(struct net *net)
 {
-	struct phonet_net *pnn = net_generic(net, phonet_net_id);
+	struct phonet_net *pnn = phonet_pernet(net);
 	struct net_device *dev;
 	unsigned i;
 
@@ -376,7 +383,7 @@ void phonet_device_exit(void)
 
 int phonet_route_add(struct net_device *dev, u8 daddr)
 {
-	struct phonet_net *pnn = net_generic(dev_net(dev), phonet_net_id);
+	struct phonet_net *pnn = phonet_pernet(dev_net(dev));
 	struct phonet_routes *routes = &pnn->routes;
 	int err = -EEXIST;
 
@@ -393,7 +400,7 @@ int phonet_route_add(struct net_device *dev, u8 daddr)
 
 int phonet_route_del(struct net_device *dev, u8 daddr)
 {
-	struct phonet_net *pnn = net_generic(dev_net(dev), phonet_net_id);
+	struct phonet_net *pnn = phonet_pernet(dev_net(dev));
 	struct phonet_routes *routes = &pnn->routes;
 
 	daddr = daddr >> 2;
@@ -413,7 +420,7 @@ int phonet_route_del(struct net_device *dev, u8 daddr)
 
 struct net_device *phonet_route_get(struct net *net, u8 daddr)
 {
-	struct phonet_net *pnn = net_generic(net, phonet_net_id);
+	struct phonet_net *pnn = phonet_pernet(net);
 	struct phonet_routes *routes = &pnn->routes;
 	struct net_device *dev;
 
@@ -428,7 +435,7 @@ struct net_device *phonet_route_get(struct net *net, u8 daddr)
 
 struct net_device *phonet_route_output(struct net *net, u8 daddr)
 {
-	struct phonet_net *pnn = net_generic(net, phonet_net_id);
+	struct phonet_net *pnn = phonet_pernet(net);
 	struct phonet_routes *routes = &pnn->routes;
 	struct net_device *dev;
 

^ permalink raw reply related

* Re: rps perfomance WAS(Re: rps: question
From: Changli Gao @ 2010-04-26 13:35 UTC (permalink / raw)
  To: hadi; +Cc: Eric Dumazet, Rick Jones, David Miller, therbert, netdev, robert,
	andi
In-Reply-To: <1272281709.8918.35.camel@bigi>

On Mon, Apr 26, 2010 at 7:35 PM, jamal <hadi@cyberus.ca> wrote:
> On Sun, 2010-04-25 at 10:31 +0800, Changli Gao wrote:
>
>> I read the code again, and find that we don't use spin_lock_irqsave(),
>> and we use local_irq_save() and spin_lock() instead, so
>> _raw_spin_lock_irqsave() and _raw_spin_lock_irqrestore() should not be
>> related to backlog. the lock maybe sk_receive_queue.lock.
>
> Possible.
> I am wondering if there's a way we can precisely nail where that is
> happening? is lockstat any use?
> Fixing _raw_spin_lock_irqsave and friend is the lowest hanging fruit.
>

Maybe lockstat can help in this case.

> So looking at your patch now i see it is likely there was an improvement
> made for non-rps case (moving out of loop some irq_enable etc).
> i.e my results may not be crazy after adding your patch and seeing an
> improvement for non-rps case.
> However, whatever your patch did - it did not help the rps case case:
> call_function_single_interrupt() comes out higher in the profile,
> and # of IPIs seems to have gone up (although i did not measure this, I
> can see the interrupts/second went up by almost 50-60%)

Did you apply the patch from Eric? It would reduce the number of
local_irq_disable() calls but increase the number of IPIs.

>
>> Jamal, did you use a single socket to serve all the clients?
>
> Socket per detected cpu.

Ignore it. I made a mistake here.

>
>> BTW:  completion_queue and output_queue in softnet_data both are LIFO
>> queues. For completion_queue, FIFO is better, as the last used skb is
>> more likely in cache, and should be used first. Since slab has always
>> cache the last used memory at the head, we'd better free the skb in
>> FIFO manner. For output_queue, FIFO is good for fairness among qdiscs.
>
> I think it will depend on how many of those skbs are sitting in the
> completion queue, cache warmth etc. LIFO is always safest, you have
> higher probability of finding a cached skb infront.
>

we call kfree_skb() to release skbs to slab allocator, then slab
allocator stores them in a LIFO queue. If completion queue is also a
LIFO queue, the latest unused skb will be in the front of the queue,
and will be released to slab allocator at first. At the next time, we
call alloc_skb(), the memory used by the skb in the end of the
completion queue will be returned instead of the hot one.

However, as Eric said, new drivers don't rely on completion queue, it
isn't a real problem, especially in your test case.


-- 
Regards，
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: [PATCH net-2.6] netns: assign NULL after pernet memory is freed
From: Jiri Pirko @ 2010-04-26 13:30 UTC (permalink / raw)
  To: netdev; +Cc: davem, ebiederm
In-Reply-To: <20100426120716.GD2941@psychotron.lab.eng.brq.redhat.com>

Mon, Apr 26, 2010 at 02:07:17PM CEST, jpirko@redhat.com wrote:
>Mon, Apr 26, 2010 at 01:18:07PM CEST, jpirko@redhat.com wrote:
>>This is needed to let know appropriate code (driver) know that pernet memory
>>chunk  was freed already. For example when driver has also registered netdev
>>notifier, it can be called after memory is freed which could eventually lead
>>to panic. Also a null check should be present in these notifiers.
>>
>>For example, previously, when drivers were responsible for pernet memory
>>allocation/freeing, in pppoe this assign was done in pppoe_exit_net() right
>>after pernet memory was freed. The check in notifier stayed (in pppoe_flush_dev
>>called from pppoe_device_event) but makes no sense now without this patch.
>
>Hmm, thinking about this, since the life of pernet memories is directly connected
>with the life of net, as described by Eric, this should not be needed. So please
>scrach this one too. Sorry.
>
>Anyway, I'm still wondering how to prevent netdev_notifiers from using this when
>net is gone. Going to do more research, I'm probably missing something.

Right, it becomes part of init_ns then. So this looks good then. One thing I can
still imagine what may cause problem, since netdev_notifier is registered before
register_pernet_device, the notifier can be called in between. So I think this
should be reordered, am I right?

Thanks

Jirka

>
>
>>
>>I already know about several notifiers where the check should be present,
>>patches will follow up.
>>
>>Signed-off-by: Jiri Pirko <jpirko@redhat.com>
>>
>>diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
>>index bd8c471..29f622c 100644
>>--- a/net/core/net_namespace.c
>>+++ b/net/core/net_namespace.c
>>@@ -51,6 +51,7 @@ static void ops_free(const struct pernet_operations *ops, struct net *net)
>> 	if (ops->id && ops->size) {
>> 		int id = *ops->id;
>> 		kfree(net_generic(net, id));
>>+		net_assign_generic(net, id, NULL);
>> 	}
>> }
>> 

^ permalink raw reply

* Re: [PATCH net-2.6] netns: assign NULL after pernet memory is freed
From: Jiri Pirko @ 2010-04-26 12:07 UTC (permalink / raw)
  To: netdev; +Cc: davem, ebiederm
In-Reply-To: <20100426111806.GB2941@psychotron.lab.eng.brq.redhat.com>

Mon, Apr 26, 2010 at 01:18:07PM CEST, jpirko@redhat.com wrote:
>This is needed to let know appropriate code (driver) know that pernet memory
>chunk  was freed already. For example when driver has also registered netdev
>notifier, it can be called after memory is freed which could eventually lead
>to panic. Also a null check should be present in these notifiers.
>
>For example, previously, when drivers were responsible for pernet memory
>allocation/freeing, in pppoe this assign was done in pppoe_exit_net() right
>after pernet memory was freed. The check in notifier stayed (in pppoe_flush_dev
>called from pppoe_device_event) but makes no sense now without this patch.

Hmm, thinking about this, since the life of pernet memories is directly connected
with the life of net, as described by Eric, this should not be needed. So please
scrach this one too. Sorry.

Anyway, I'm still wondering how to prevent netdev_notifiers from using this when
net is gone. Going to do more research, I'm probably missing something.


>
>I already know about several notifiers where the check should be present,
>patches will follow up.
>
>Signed-off-by: Jiri Pirko <jpirko@redhat.com>
>
>diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
>index bd8c471..29f622c 100644
>--- a/net/core/net_namespace.c
>+++ b/net/core/net_namespace.c
>@@ -51,6 +51,7 @@ static void ops_free(const struct pernet_operations *ops, struct net *net)
> 	if (ops->id && ops->size) {
> 		int id = *ops->id;
> 		kfree(net_generic(net, id));
>+		net_assign_generic(net, id, NULL);
> 	}
> }
> 

^ permalink raw reply

* [PATCH net-next-2.6] pppoe: use pppoe_pernet instead of directly net_generic
From: Jiri Pirko @ 2010-04-26 11:46 UTC (permalink / raw)
  To: netdev; +Cc: davem, mostrows


Signed-off-by: Jiri Pirko <jpirko@redhat.com>

diff --git a/drivers/net/pppoe.c b/drivers/net/pppoe.c
index cdd11ba..c059c8d 100644
--- a/drivers/net/pppoe.c
+++ b/drivers/net/pppoe.c
@@ -258,7 +258,7 @@ static inline struct pppox_sock *get_item_by_addr(struct net *net,
 	dev = dev_get_by_name_rcu(net, sp->sa_addr.pppoe.dev);
 	if (dev) {
 		ifindex = dev->ifindex;
-		pn = net_generic(net, pppoe_net_id);
+		pn = pppoe_pernet(net);
 		pppox_sock = get_item(pn, sp->sa_addr.pppoe.sid,
 				sp->sa_addr.pppoe.remote, ifindex);
 	}

^ permalink raw reply related

* [patch v2] bluetooth: handle l2cap_create_connless_pdu() errors
From: Dan Carpenter @ 2010-04-26 11:36 UTC (permalink / raw)
  To: Gustavo F. Padovan
  Cc: Marcel Holtmann, David S. Miller, Andrei Emeltchenko,
	linux-bluetooth, netdev, kernel-janitors
In-Reply-To: <20100422123347.GA5265@vigoh>

l2cap_create_connless_pdu() can sometimes return ERR_PTR(-ENOMEM) or
ERR_PTR(-EFAULT).

Signed-off-by: Dan Carpenter <error27@gmail.com>
---
In v2 I wrote the patch on top of Gustavo Padovon's devel tree

diff --git a/net/bluetooth/l2cap.c b/net/bluetooth/l2cap.c
index a668ef4..b32682c 100644
--- a/net/bluetooth/l2cap.c
+++ b/net/bluetooth/l2cap.c
@@ -1712,8 +1712,12 @@ static int l2cap_sock_sendmsg(struct kiocb *iocb, struct socket *sock, struct ms
 	/* Connectionless channel */
 	if (sk->sk_type == SOCK_DGRAM) {
 		skb = l2cap_create_connless_pdu(sk, msg, len);
-		l2cap_do_send(sk, skb);
-		err = len;
+		if (IS_ERR(skb)) {
+			err = PTR_ERR(skb);
+		} else {
+			l2cap_do_send(sk, skb);
+			err = len;
+		}
 		goto done;
 	}
 

^ permalink raw reply related

* Re: rps perfomance WAS(Re: rps: question
From: jamal @ 2010-04-26 11:35 UTC (permalink / raw)
  To: Changli Gao
  Cc: Eric Dumazet, Rick Jones, David Miller, therbert, netdev, robert,
	andi
In-Reply-To: <t2r412e6f7f1004241931ie9e70e3aka77b49557a7872e3@mail.gmail.com>

On Sun, 2010-04-25 at 10:31 +0800, Changli Gao wrote:

> I read the code again, and find that we don't use spin_lock_irqsave(),
> and we use local_irq_save() and spin_lock() instead, so
> _raw_spin_lock_irqsave() and _raw_spin_lock_irqrestore() should not be
> related to backlog. the lock maybe sk_receive_queue.lock.

Possible.
I am wondering if there's a way we can precisely nail where that is
happening? is lockstat any use? 
Fixing _raw_spin_lock_irqsave and friend is the lowest hanging fruit.

So looking at your patch now i see it is likely there was an improvement
made for non-rps case (moving out of loop some irq_enable etc).
i.e my results may not be crazy after adding your patch and seeing an
improvement for non-rps case.
However, whatever your patch did - it did not help the rps case case:
call_function_single_interrupt() comes out higher in the profile,
and # of IPIs seems to have gone up (although i did not measure this, I
can see the interrupts/second went up by almost 50-60%)

> Jamal, did you use a single socket to serve all the clients?

Socket per detected cpu.

> BTW:  completion_queue and output_queue in softnet_data both are LIFO
> queues. For completion_queue, FIFO is better, as the last used skb is
> more likely in cache, and should be used first. Since slab has always
> cache the last used memory at the head, we'd better free the skb in
> FIFO manner. For output_queue, FIFO is good for fairness among qdiscs.

I think it will depend on how many of those skbs are sitting in the
completion queue, cache warmth etc. LIFO is always safest, you have
higher probability of finding a cached skb infront.

cheers,
jamal

^ permalink raw reply

* [PATCH net-2.6] netns: assign NULL after pernet memory is freed
From: Jiri Pirko @ 2010-04-26 11:18 UTC (permalink / raw)
  To: netdev; +Cc: davem, ebiederm

This is needed to let know appropriate code (driver) know that pernet memory
chunk  was freed already. For example when driver has also registered netdev
notifier, it can be called after memory is freed which could eventually lead
to panic. Also a null check should be present in these notifiers.

For example, previously, when drivers were responsible for pernet memory
allocation/freeing, in pppoe this assign was done in pppoe_exit_net() right
after pernet memory was freed. The check in notifier stayed (in pppoe_flush_dev
called from pppoe_device_event) but makes no sense now without this patch.

I already know about several notifiers where the check should be present,
patches will follow up.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>

diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index bd8c471..29f622c 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -51,6 +51,7 @@ static void ops_free(const struct pernet_operations *ops, struct net *net)
 	if (ops->id && ops->size) {
 		int id = *ops->id;
 		kfree(net_generic(net, id));
+		net_assign_generic(net, id, NULL);
 	}
 }

^ permalink raw reply related

* Re: [PATCH net-next-2.6] netns: call ops_free right after ops_exit
From: Jiri Pirko @ 2010-04-26 10:43 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: David Miller, netdev
In-Reply-To: <m1r5m2ud40.fsf@fess.ebiederm.org>

Mon, Apr 26, 2010 at 05:06:07AM CEST, ebiederm@xmission.com wrote:
>Jiri Pirko <jpirko@redhat.com> writes:
>
>> Sun, Apr 25, 2010 at 04:50:34PM CEST, ebiederm@xmission.com wrote:
>>>David Miller <davem@davemloft.net> writes:
>>>
>>>> From: Jiri Pirko <jpirko@redhat.com>
>>>> Date: Sun, 25 Apr 2010 11:26:01 +0200
>>>>
>>>>> There's no need to iterate this twice. We can free net generic
>>>>> variables right after exit is called.
>>>>>
>>>>> Signed-off-by: Jiri Pirko <jpirko@redhat.com>
>>>>
>>>> Are you sure there are no problems with doing this?
>>>>
>>>> What if there are inter-net variable reference dependencies
>>>> or something like that?
>>>>
>>>> I really suspect it is being done this way on purpose, but
>>>> in the end I defer to experts like Eric B. :-)
>>>
>>>I am pretty certain there is a problem.  My memory is fuzzy this
>>>morning but I believe we can have rcu references between various
>>>pieces of the networking stack for a single network namespace.  So we
>>>need to cause all of the network namespace to exit before it is safe
>>>to free those pieces.
>>
>> Hmm, that doesn't make much sense to me. Since the allocated memory in question
>> is used locally, after exit() is called, the memory chunk should not be used by
>> anyone and if it is, I think it's a bug.
>>
>> Earlier, when the memory wasn't allocated automatically (by filling .size)
>> memory was individually freed in exit(). From what I understood from your reply,
>> you are telling this was buggy?
>
>Now I remember clearly.  The use case for not freeing memory
>immediately is the delayed freeing of network devices.  Ideally we
>delay unregistering all of the network devices into
>default_device_exit_batch, when we terminate one or namespaces.
>
>Since network devices are freed outside of their exit routines we need
>to keep their per net memory around until they are freed.
>
>Ultimately all of this is much easier to think about if these chunks of
>memory can be logically thought of as living on struct net and have the
>same lifetime rules.

Ok, I see your point. So scratch this.

Thanks,

	Jirka
>
>Eric

^ permalink raw reply

* Re: [PATCH torvalds-2.6] cdc_ether: fix autosuspend for mbm devices
From: Oliver Neukum @ 2010-04-26 10:30 UTC (permalink / raw)
  To: Torgny Johansson; +Cc: netdev
In-Reply-To: <201004261214.13836.torgny.johansson@ericsson.com>

Am Montag, 26. April 2010 12:14:13 schrieb Torgny Johansson:
> Hi!
> 
> In 2.6.33 it seems autosuspend is broken for mbm devices.

It is hardly broken. It simply wasn't implemented.

> Autosuspend works until you bring the wwan interface up, then the device does not enter autosuspend anymore.
> 
> The following patch fixes the problem by setting the .manage_power field in the mbm_info struct 
> to the same as in the cdc_info struct (cdc_manager_power).
> 
> I am unsure exactly what that does and why autosuspend doesn't work without it. 
> Can you guys comment on that? Is this fix the correct approach?

The patch is correct. If you use another struct driver_info it is a different
driver as far as usbnet is concerned. A driver needs to give some
minimal support for autosuspend. mbm didn't do so.
 
> Also, if you like this patch, is it possible to get it included in a minor release of 2.6.33 (e.g. 2.6.33.3) as a bugfix?

Up to David.

	Regards
		Oliver

^ permalink raw reply

* [PATCH torvalds-2.6] cdc_ether: fix autosuspend for mbm devices
From: Torgny Johansson @ 2010-04-26 10:14 UTC (permalink / raw)
  To: oliver, davem, netdev

Hi!

In 2.6.33 it seems autosuspend is broken for mbm devices.

Autosuspend works until you bring the wwan interface up, then the device does not enter autosuspend anymore.

The following patch fixes the problem by setting the .manage_power field in the mbm_info struct 
to the same as in the cdc_info struct (cdc_manager_power).

I am unsure exactly what that does and why autosuspend doesn't work without it. 
Can you guys comment on that? Is this fix the correct approach?

Also, if you like this patch, is it possible to get it included in a minor release of 2.6.33 (e.g. 2.6.33.3) as a bugfix?

Regards
Torgny Johansson

---
Signed-off-by: Torgny Johansson <torgny.johansson@ericsson.com>

diff --git a/drivers/net/usb/cdc_ether.c b/drivers/net/usb/cdc_ether.c
index c8cdb7f..3547cf1 100644
--- a/drivers/net/usb/cdc_ether.c
+++ b/drivers/net/usb/cdc_ether.c
@@ -431,6 +431,7 @@ static const struct driver_info mbm_info = {
 	.bind = 	cdc_bind,
 	.unbind =	usbnet_cdc_unbind,
 	.status =	cdc_status,
+	.manage_power =	cdc_manage_power,
 };

 /*-------------------------------------------------------------------------*/

^ permalink raw reply related

* Re: [PATCH v3] TCP: avoid to send keepalive probes if receiving data
From: Ilpo Järvinen @ 2010-04-26  9:47 UTC (permalink / raw)
  To: Flavio Leitner; +Cc: Netdev, David Miller, Eric Dumazet
In-Reply-To: <1272249659-2292-1-git-send-email-fleitner@redhat.com>

On Sun, 25 Apr 2010, Flavio Leitner wrote:

> RFC 1122 says the following:
> ...
>   Keep-alive packets MUST only be sent when no data or
>   acknowledgement packets have been received for the
>   connection within an interval.
> ...
> 
> The acknowledgement packet is reseting the keepalive
> timer but the data packet isn't. This patch fixes it by
> checking the timestamp of the last received data packet
> too when the keepalive timer expires.
> 
> Signed-off-by: Flavio Leitner <fleitner@redhat.com>
> ---
>  net/ipv4/tcp.c       |    5 ++++-
>  net/ipv4/tcp_timer.c |    8 ++++++++
>  2 files changed, 12 insertions(+), 1 deletions(-)
> 
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 0f8caf6..94f605c 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -2298,7 +2298,10 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
>  			if (sock_flag(sk, SOCK_KEEPOPEN) &&
>  			    !((1 << sk->sk_state) &
>  			      (TCPF_CLOSE | TCPF_LISTEN))) {
> -				__u32 elapsed = tcp_time_stamp - tp->rcv_tstamp;
> +				u32 elapsed = min_t(u32,
> +				      tcp_time_stamp - icsk->icsk_ack.lrcvtime,
> +				      tcp_time_stamp - tp->rcv_tstamp);
> +

I'd put this into a helper in include/net/tcp.h

>  				if (tp->keepalive_time > elapsed)
>  					elapsed = tp->keepalive_time - elapsed;
>  				else
> diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
> index 8a0ab29..9d1bb6f 100644
> --- a/net/ipv4/tcp_timer.c
> +++ b/net/ipv4/tcp_timer.c
> @@ -554,6 +554,14 @@ static void tcp_keepalive_timer (unsigned long data)
>  	if (tp->packets_out || tcp_send_head(sk))
>  		goto resched;
>  
> +	elapsed = tcp_time_stamp - icsk->icsk_ack.lrcvtime;
> +
> +	/* receiving data means alive */
> +	if (elapsed < keepalive_time_when(tp)) {
> +		elapsed = keepalive_time_when(tp) - elapsed;
> +		goto resched;
> +	}
> +
>  	elapsed = tcp_time_stamp - tp->rcv_tstamp;

...And then use it here too for setting the elapsed and doing the test 
and what follows only once?

>  	if (elapsed >= keepalive_time_when(tp)) {
> 

-- 
 i.

^ permalink raw reply

* Re: [PATCH 2/2] net: reimplement softnet_data.output_queue as a FIFO  queue
From: Eric Dumazet @ 2010-04-26  9:30 UTC (permalink / raw)
  To: Changli Gao; +Cc: netdev
In-Reply-To: <q2p412e6f7f1004260211t19efd3ebk183577ac4a39d31e@mail.gmail.com>

Le lundi 26 avril 2010 à 17:11 +0800, Changli Gao a écrit :
> On Mon, Apr 26, 2010 at 11:26 AM, Changli Gao <xiaosuo@gmail.com> wrote:
> > reimplement softnet_data.output_queue as a FIFO queue.
> >
> > reimplement softnet_data.output_queue as a FIFO queue to keep the fairness among
> > the qdiscs rescheduled.
> >
> 
> Dear Eric:
>   do you think it is worth continuing? I didn't received any comments
> even negative ones. I am really sorry for disrupting you, but I don't
> know what did it mean.
> 

For this patch I had no comment, I believe it is fine, at a first
glance...

Please give some time for netdev people to review it and Ack it.




^ permalink raw reply

* Re: [PATCH 2/2] net: reimplement softnet_data.output_queue as a FIFO queue
From: Changli Gao @ 2010-04-26  9:11 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Changli Gao
In-Reply-To: <1272252361-11799-1-git-send-email-xiaosuo@gmail.com>

On Mon, Apr 26, 2010 at 11:26 AM, Changli Gao <xiaosuo@gmail.com> wrote:
> reimplement softnet_data.output_queue as a FIFO queue.
>
> reimplement softnet_data.output_queue as a FIFO queue to keep the fairness among
> the qdiscs rescheduled.
>

Dear Eric:
  do you think it is worth continuing? I didn't received any comments
even negative ones. I am really sorry for disrupting you, but I don't
know what did it mean.

-- 
Regards，
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: [PATCH] e100: expose broadcast_disabled as a module option
From: Erwan Velu @ 2010-04-26  8:49 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Jeff Kirsher, netdev, David Miller, linux-kernel,
	jesse.brandeburg, bruce.w.allan, alexander.h.duyck,
	peter.p.waskiewicz.jr, john.ronciak
In-Reply-To: <20100423150207.7969c9f6@nehalam>

Agreed that will be a better implementation. Changes of IFF_BROADCAST
could play with this "broadcast_disabled" configuration switch to
increase the cpu efficiency.

 I'm not a kernel expert and I don't really figure how the changes of
the IFF_BROADCAST should be forwarded to the interfaces and how it can
be manipulated. I saw that ifconfig have a '-broadcast' option but
looks like none of my drivers are compatible with that.

Does any one you could help me understanding what should be the good way to
1- enabled/disabled IFF_BROADCAST for a given interface
2- populate this changes to the driver

Once I'll be able to have a proper patch using that, I'll post it
again to the list.

Cheers,
Erwan

2010/4/24 Stephen Hemminger <shemminger@vyatta.com>:
> On Fri, 23 Apr 2010 23:03:59 +0200
[...]
> The point is that the driver can look at IFF_BROADCAST rather than having
> module parameter. Module parameters are device driver specific and should
> be avoid as much as possible in favor of general mechanism. This is a repeated
> problem where users and vendors make special hooks that only work with their
> driver, which makes life hard for other users and distribution providers.
>

^ permalink raw reply

* Re: [PATCH v5] rfs: Receive Flow Steering
From: Eric Dumazet @ 2010-04-26  8:41 UTC (permalink / raw)
  To: David Miller; +Cc: therbert, netdev
In-Reply-To: <20100419.141947.129754682.davem@davemloft.net>

Le lundi 19 avril 2010 à 14:19 -0700, David Miller a écrit :

> 
> I was thinking also about how we could compute rxhash in the
> loopback driver :-)

This would be easy if rxhash was not a "struct inet_sock" field but a
"struct sock" one

sock_alloc_send_pskb() (or skb_set_owner_w())

skb->rxhash = sk->rxhash;




^ permalink raw reply

* Re: [PATCH net-next-2.6] net: sk_sleep() helper
From: Eric Dumazet @ 2010-04-26  8:20 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20100420.163919.82125824.davem@davemloft.net>

Le mardi 20 avril 2010 à 16:39 -0700, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Wed, 21 Apr 2010 01:03:51 +0200
> 
> > [PATCH net-next-2.6] net: sk_sleep() helper
> 
> Looks good, applied, thanks Eric.

Here is a followup to this patch, I missed three files in this
conversion.

Thanks !

[PATCH net-next-2.6] net: use sk_sleep()

Commit aa395145 (net: sk_sleep() helper) missed three files in the
conversion.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 drivers/net/macvtap.c  |    6 +++---
 net/caif/caif_socket.c |   30 +++++++++++++++---------------
 net/rxrpc/ar-recvmsg.c |    6 +++---
 3 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 85d6420..d97e1fd 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -181,7 +181,7 @@ static int macvtap_forward(struct net_device *dev, struct sk_buff *skb)
 		return -ENOLINK;
 
 	skb_queue_tail(&q->sk.sk_receive_queue, skb);
-	wake_up_interruptible_poll(q->sk.sk_sleep, POLLIN | POLLRDNORM | POLLRDBAND);
+	wake_up_interruptible_poll(sk_sleep(&q->sk), POLLIN | POLLRDNORM | POLLRDBAND);
 	return 0;
 }
 
@@ -562,7 +562,7 @@ static ssize_t macvtap_do_read(struct macvtap_queue *q, struct kiocb *iocb,
 	struct sk_buff *skb;
 	ssize_t ret = 0;
 
-	add_wait_queue(q->sk.sk_sleep, &wait);
+	add_wait_queue(sk_sleep(&q->sk), &wait);
 	while (len) {
 		current->state = TASK_INTERRUPTIBLE;
 
@@ -587,7 +587,7 @@ static ssize_t macvtap_do_read(struct macvtap_queue *q, struct kiocb *iocb,
 	}
 
 	current->state = TASK_RUNNING;
-	remove_wait_queue(q->sk.sk_sleep, &wait);
+	remove_wait_queue(sk_sleep(&q->sk), &wait);
 	return ret;
 }
 
diff --git a/net/caif/caif_socket.c b/net/caif/caif_socket.c
index 90317e7..d455375 100644
--- a/net/caif/caif_socket.c
+++ b/net/caif/caif_socket.c
@@ -169,7 +169,7 @@ static int caif_sktrecv_cb(struct cflayer *layr, struct cfpkt *pkt)
 
 	/* Signal reader that data is available. */
 
-	wake_up_interruptible(cf_sk->sk.sk_sleep);
+	wake_up_interruptible(sk_sleep(&cf_sk->sk));
 
 	return 0;
 }
@@ -203,7 +203,7 @@ static void caif_sktflowctrl_cb(struct cflayer *layr,
 		dbfs_atomic_inc(&cnt.num_tx_flow_on_ind);
 		/* Signal reader that data is available. */
 		SET_TX_FLOW_ON(cf_sk);
-		wake_up_interruptible(cf_sk->sk.sk_sleep);
+		wake_up_interruptible(sk_sleep(&cf_sk->sk));
 		break;
 
 	case CAIF_CTRLCMD_FLOW_OFF_IND:
@@ -217,7 +217,7 @@ static void caif_sktflowctrl_cb(struct cflayer *layr,
 		caif_assert(STATE_IS_OPEN(cf_sk));
 		SET_PENDING_OFF(cf_sk);
 		SET_TX_FLOW_ON(cf_sk);
-		wake_up_interruptible(cf_sk->sk.sk_sleep);
+		wake_up_interruptible(sk_sleep(&cf_sk->sk));
 		break;
 
 	case CAIF_CTRLCMD_DEINIT_RSP:
@@ -225,8 +225,8 @@ static void caif_sktflowctrl_cb(struct cflayer *layr,
 		caif_assert(!STATE_IS_OPEN(cf_sk));
 		SET_PENDING_OFF(cf_sk);
 		if (!STATE_IS_PENDING_DESTROY(cf_sk)) {
-			if (cf_sk->sk.sk_sleep != NULL)
-				wake_up_interruptible(cf_sk->sk.sk_sleep);
+			if (sk_sleep(&cf_sk->sk) != NULL)
+				wake_up_interruptible(sk_sleep(&cf_sk->sk));
 		}
 		dbfs_atomic_inc(&cnt.num_deinit);
 		sock_put(&cf_sk->sk);
@@ -238,7 +238,7 @@ static void caif_sktflowctrl_cb(struct cflayer *layr,
 		SET_STATE_CLOSED(cf_sk);
 		SET_PENDING_OFF(cf_sk);
 		SET_TX_FLOW_OFF(cf_sk);
-		wake_up_interruptible(cf_sk->sk.sk_sleep);
+		wake_up_interruptible(sk_sleep(&cf_sk->sk));
 		break;
 
 	case CAIF_CTRLCMD_REMOTE_SHUTDOWN_IND:
@@ -247,7 +247,7 @@ static void caif_sktflowctrl_cb(struct cflayer *layr,
 		/* Use sk_shutdown to indicate remote shutdown indication */
 		cf_sk->sk.sk_shutdown |= RCV_SHUTDOWN;
 		cf_sk->file_mode = 0;
-		wake_up_interruptible(cf_sk->sk.sk_sleep);
+		wake_up_interruptible(sk_sleep(&cf_sk->sk));
 		break;
 
 	default:
@@ -325,7 +325,7 @@ static int caif_recvmsg(struct kiocb *iocb, struct socket *sock,
 		release_sock(&cf_sk->sk);
 
 		result =
-		    wait_event_interruptible(*cf_sk->sk.sk_sleep,
+		    wait_event_interruptible(*sk_sleep(&cf_sk->sk),
 					     !STATE_IS_PENDING(cf_sk));
 
 		lock_sock(&(cf_sk->sk));
@@ -365,7 +365,7 @@ static int caif_recvmsg(struct kiocb *iocb, struct socket *sock,
 		release_sock(&cf_sk->sk);
 
 		/* Block reader until data arrives or socket is closed. */
-		if (wait_event_interruptible(*cf_sk->sk.sk_sleep,
+		if (wait_event_interruptible(*sk_sleep(&cf_sk->sk),
 					cfpkt_qpeek(cf_sk->pktq)
 					|| STATE_IS_REMOTE_SHUTDOWN(cf_sk)
 					|| !STATE_IS_OPEN(cf_sk)) ==
@@ -537,7 +537,7 @@ static int caif_sendmsg(struct kiocb *kiocb, struct socket *sock,
 		 * for its conclusion.
 		 */
 		result =
-		    wait_event_interruptible(*cf_sk->sk.sk_sleep,
+		    wait_event_interruptible(*sk_sleep(&cf_sk->sk),
 					     !STATE_IS_PENDING(cf_sk));
 		/* I want to be alone on cf_sk (except status and queue) */
 		lock_sock(&(cf_sk->sk));
@@ -573,7 +573,7 @@ static int caif_sendmsg(struct kiocb *kiocb, struct socket *sock,
 		release_sock(&cf_sk->sk);
 
 		/* Wait until flow is on or socket is closed */
-		if (wait_event_interruptible(*cf_sk->sk.sk_sleep,
+		if (wait_event_interruptible(*sk_sleep(&cf_sk->sk),
 					TX_FLOW_IS_ON(cf_sk)
 					|| !STATE_IS_OPEN(cf_sk)
 					|| STATE_IS_REMOTE_SHUTDOWN(cf_sk)
@@ -650,7 +650,7 @@ static int caif_sendmsg(struct kiocb *kiocb, struct socket *sock,
 		release_sock(&cf_sk->sk);
 
 		/* Wait until flow is on or socket is closed */
-		if (wait_event_interruptible(*cf_sk->sk.sk_sleep,
+		if (wait_event_interruptible(*sk_sleep(&cf_sk->sk),
 					TX_FLOW_IS_ON(cf_sk)
 					|| !STATE_IS_OPEN(cf_sk)
 					|| STATE_IS_REMOTE_SHUTDOWN(cf_sk)
@@ -898,7 +898,7 @@ static int caif_connect(struct socket *sock, struct sockaddr *uservaddr,
 			 * for its conclusion.
 			 */
 			result =
-			    wait_event_interruptible(*cf_sk->sk.sk_sleep,
+			    wait_event_interruptible(*sk_sleep(&cf_sk->sk),
 						     !STATE_IS_PENDING(cf_sk));
 
 			lock_sock(&(cf_sk->sk));
@@ -965,7 +965,7 @@ static int caif_connect(struct socket *sock, struct sockaddr *uservaddr,
 		release_sock(&cf_sk->sk);
 
 		result =
-		    wait_event_interruptible(*cf_sk->sk.sk_sleep,
+		    wait_event_interruptible(*sk_sleep(&cf_sk->sk),
 					     !STATE_IS_PENDING(cf_sk));
 
 		lock_sock(&(cf_sk->sk));
@@ -1107,7 +1107,7 @@ static int caif_release(struct socket *sock)
 	 * CAIF stack.
 	 */
 	if (!(sock->file->f_flags & O_NONBLOCK)) {
-		res = wait_event_interruptible(*cf_sk->sk.sk_sleep,
+		res = wait_event_interruptible(*sk_sleep(&cf_sk->sk),
 						!STATE_IS_PENDING(cf_sk));
 
 		if (res == -ERESTARTSYS) {
diff --git a/net/rxrpc/ar-recvmsg.c b/net/rxrpc/ar-recvmsg.c
index 60c2b94..0c65013 100644
--- a/net/rxrpc/ar-recvmsg.c
+++ b/net/rxrpc/ar-recvmsg.c
@@ -91,7 +91,7 @@ int rxrpc_recvmsg(struct kiocb *iocb, struct socket *sock,
 
 			/* wait for a message to turn up */
 			release_sock(&rx->sk);
-			prepare_to_wait_exclusive(rx->sk.sk_sleep, &wait,
+			prepare_to_wait_exclusive(sk_sleep(&rx->sk), &wait,
 						  TASK_INTERRUPTIBLE);
 			ret = sock_error(&rx->sk);
 			if (ret)
@@ -102,7 +102,7 @@ int rxrpc_recvmsg(struct kiocb *iocb, struct socket *sock,
 					goto wait_interrupted;
 				timeo = schedule_timeout(timeo);
 			}
-			finish_wait(rx->sk.sk_sleep, &wait);
+			finish_wait(sk_sleep(&rx->sk), &wait);
 			lock_sock(&rx->sk);
 			continue;
 		}
@@ -356,7 +356,7 @@ csum_copy_error:
 wait_interrupted:
 	ret = sock_intr_errno(timeo);
 wait_error:
-	finish_wait(rx->sk.sk_sleep, &wait);
+	finish_wait(sk_sleep(&rx->sk), &wait);
 	if (continue_call)
 		rxrpc_put_call(continue_call);
 	if (copied)



^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox