Netdev List
 help / color / mirror / Atom feed
* Re: [patch] usbnet: fix 100% CPU use on suspended device
From: Alan Stern @ 2010-07-26 14:36 UTC (permalink / raw)
  To: Elly Jones; +Cc: David Miller, netdev, linux-usb
In-Reply-To: <AANLkTik=b3ps_iMOC=mjDs2quLwBPUKbiqn0nKU3EbB1@mail.gmail.com>

On Mon, 26 Jul 2010, Elly Jones wrote:

> On Mon, Jul 26, 2010 at 12:57 AM, David Miller <davem@davemloft.net> wrote:
> > From: Elly Jones <ellyjones@google.com>
> > Date: Wed, 21 Jul 2010 14:51:48 -0400
> >
> >> Subject: [patch] usbnet: fix 100% CPU use on suspended device
> >> From: Elly Jones <ellyjones@google.com>
> >>
> >> This patch causes the usbnet module not to attempt to submit URBs to the device
> >> if the device is not ready to accept them. This fixes a misbehavior trigged by
> >> the Qualcomm Gobi driver (released under GPL through their Code Aurora
> >> initiative) which causes the usbnet core to consume 100% of CPU attempting and
> >> failing to submit URBs. This patch is against Linus's 2.6 repo commit
> >> a9f7f2e74ae0e6a801a2433dc8e9124d73da0cb4.
> >> Signed-off-by: Elizabeth Jones <ellyjones@google.com>
> >
> > If the Qualcomm Gobi driver is the only one where this happens, maybe the
> > real problem is in that driver.
> 
> The member in question (dev->udev->can_submit) is documented as 'URBs
> may be submitted'. The existing code doesn't honor that flag. It is
> somewhat puzzling that (so far) only the Gobi driver seems to use that
> flag, but I don't think the bug lies in their driver here.

That flag is not intended for use by drivers but by the USB core.  (It
gets cleared only when the device is suspended.)  usbnet and Gobi
shouldn't need to look at it.

> > I'm not applying this until a USB person looks more deeply into this.
> 
> Alright. Can you suggest a particular USB person to bother?
> 
> >> ---
> >> diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
> >> index 81c76ad..df7e72e 100644
> >> --- a/drivers/net/usb/usbnet.c
> >> +++ b/drivers/net/usb/usbnet.c
> >> @@ -1172,6 +1172,7 @@ static void usbnet_bh (unsigned long param)
> >>       // or are we maybe short a few urbs?
> >>       } else if (netif_running (dev->net) &&
> >>                  netif_device_present (dev->net) &&
> >> +                dev->udev->can_submit &&
> >>                  !timer_pending (&dev->delay) &&
> >>                  !test_bit (EVENT_RX_HALT, &dev->flags)) {
> >>               int     temp = dev->rxq.qlen;

This isn't right.  The problem should be fixed some other way.  Under
what circumstances are URBs submitted incorrectly?

Alan Stern


^ permalink raw reply

* Re: lock_sock or sock_hold ?
From: Eric Dumazet @ 2010-07-26 14:43 UTC (permalink / raw)
  To: Neshama Parhoti; +Cc: netdev
In-Reply-To: <AANLkTim7LQ65RWvy+TQ+HtwjJM89zt4g12H6mFinQT0v@mail.gmail.com>

Le lundi 26 juillet 2010 à 17:00 +0300, Neshama Parhoti a écrit :
> hello everyone,
> 
> can you please be kind and help me understand the differences between
> lock_sock and sock_hold ?
> 
> i can see that lock_sock takes a spin lock (bh) to mark owned = 1, and
> then takes a mutex, whereas sock_hold only increases the atomic
> refcnt.
> 
> but how do i know which of them I should use ?
> 
> i'm puzzled as to when should those two different APIs be used..

sock_hold() only increments a refcount, so that you are sure nobody can
destroy the socket (and its memory) under you. But you cannot modify
socket state only with this refcount taken.

To get exclusive access to the socket you either :

1) Are in process context and use lock_sock().

2) Are in softirq context (input path for example) :
 2.1) Lookup the socket in protocol hash tables, and get a refcount on
it (by sock_hold() or other atomic operation on refcnt)

 2.2) Then, get semi exclusive access using bh_lock_sock()
 2.3) Check if another process already is using the socket (we
interrupted this process on same CPU, or run on another cpu)
   if (sock_owned_by_user(sk)) {
	// queue work to socket backlog (delayed work)
   } else {
	// process packet right now
   }



^ permalink raw reply

* Re: [patch] usbnet: fix 100% CPU use on suspended device
From: Elly Jones @ 2010-07-26 14:47 UTC (permalink / raw)
  To: Alan Stern
  Cc: David Miller, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-usb-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <Pine.LNX.4.44L0.1007261033090.1550-100000-IYeN2dnnYyZXsRXLowluHWD2FQJk+8+b@public.gmane.org>

On Mon, Jul 26, 2010 at 10:36 AM, Alan Stern <stern-nwvwT67g6+6dFdvTe/nMLpVzexx5G7lz@public.gmane.org> wrote:
> On Mon, 26 Jul 2010, Elly Jones wrote:
>
>> On Mon, Jul 26, 2010 at 12:57 AM, David Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org> wrote:
>> > From: Elly Jones <ellyjones-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
>> > Date: Wed, 21 Jul 2010 14:51:48 -0400
>> >
>> >> Subject: [patch] usbnet: fix 100% CPU use on suspended device
>> >> From: Elly Jones <ellyjones-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
>> >>
>> >> This patch causes the usbnet module not to attempt to submit URBs to the device
>> >> if the device is not ready to accept them. This fixes a misbehavior trigged by
>> >> the Qualcomm Gobi driver (released under GPL through their Code Aurora
>> >> initiative) which causes the usbnet core to consume 100% of CPU attempting and
>> >> failing to submit URBs. This patch is against Linus's 2.6 repo commit
>> >> a9f7f2e74ae0e6a801a2433dc8e9124d73da0cb4.
>> >> Signed-off-by: Elizabeth Jones <ellyjones-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
>> >
>> > If the Qualcomm Gobi driver is the only one where this happens, maybe the
>> > real problem is in that driver.
>>
>> The member in question (dev->udev->can_submit) is documented as 'URBs
>> may be submitted'. The existing code doesn't honor that flag. It is
>> somewhat puzzling that (so far) only the Gobi driver seems to use that
>> flag, but I don't think the bug lies in their driver here.
>
> That flag is not intended for use by drivers but by the USB core.  (It
> gets cleared only when the device is suspended.)  usbnet and Gobi
> shouldn't need to look at it.

Aha!

>> > I'm not applying this until a USB person looks more deeply into this.
>>
>> Alright. Can you suggest a particular USB person to bother?
>>
>> >> ---
>> >> diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
>> >> index 81c76ad..df7e72e 100644
>> >> --- a/drivers/net/usb/usbnet.c
>> >> +++ b/drivers/net/usb/usbnet.c
>> >> @@ -1172,6 +1172,7 @@ static void usbnet_bh (unsigned long param)
>> >>       // or are we maybe short a few urbs?
>> >>       } else if (netif_running (dev->net) &&
>> >>                  netif_device_present (dev->net) &&
>> >> +                dev->udev->can_submit &&
>> >>                  !timer_pending (&dev->delay) &&
>> >>                  !test_bit (EVENT_RX_HALT, &dev->flags)) {
>> >>               int     temp = dev->rxq.qlen;
>
> This isn't right.  The problem should be fixed some other way.  Under
> what circumstances are URBs submitted incorrectly?

When the device is autosuspended. What is the proper thing for a
device to do here?

> Alan Stern

-- Elly
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [Bug #15704] [r8169] WARNING: at net/sched/sch_generic.c
From: Sergey Senozhatsky @ 2010-07-26 14:57 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Sergey Senozhatsky, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Maciej Rutecki, netdev
In-Reply-To: <1280154873.2899.390.camel@edumazet-laptop>

[-- Attachment #1: Type: text/plain, Size: 4172 bytes --]

On (07/26/10 16:34), Eric Dumazet wrote:
> > Sorry for the dealy - I've been away.
> > So, the bad news are that this is probably the last time I can test this bug. This laptop will
> > no longer be available. Sorry.
> > 
> > 
> > [ 2599.495508] pktgen 2.73: Packet Generator for packet performance testing.
> > [ 2630.016019] ------------[ cut here ]------------
> > [ 2630.016031] WARNING: at net/sched/sch_generic.c:258 dev_watchdog+0xc1/0x129()
> > [ 2630.016035] Hardware name: F3JC                
> > [ 2630.016038] NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
> > [ 2630.016041] Modules linked in: pktgen ipv6 arc4 ecb iwl3945 snd_hwdep iwlcore snd_seq_dummy snd_hda_codec_si3054 snd_hda_codec_realtek snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device mac80211 snd_pcm_oss snd_mixer_oss asus_laptop
> > snd_hda_intel sparse_keymap sdhci_pci sdhci mmc_core led_class cfg80211 snd_hda_codec snd_pcm snd_timer snd_page_alloc snd rng_core soundcore usb_storage sg psmouse evdev i2c_i801 serio_raw r8169 mii uhci_hcd ehci_hcd sd_mod sr_mod usbcore
> > cdrom ata_piix
> > [ 2630.016114] Pid: 4282, comm: kpktgend_0 Not tainted 2.6.35-rc6-dbg-git1-00136-g459ab2a-dirty #75
> > [ 2630.016118] Call Trace:
> > [ 2630.016125]  [<c102e3ae>] warn_slowpath_common+0x65/0x7a
> > [ 2630.016131]  [<c12695f3>] ? dev_watchdog+0xc1/0x129
> > [ 2630.016136]  [<c102e427>] warn_slowpath_fmt+0x26/0x2a
> > [ 2630.016142]  [<c12695f3>] dev_watchdog+0xc1/0x129
> > [ 2630.016149]  [<c10370b3>] ? run_timer_softirq+0x136/0x22b
> > [ 2630.016154]  [<c103710c>] run_timer_softirq+0x18f/0x22b
> > [ 2630.016160]  [<c10370b3>] ? run_timer_softirq+0x136/0x22b
> > [ 2630.016165]  [<c1269532>] ? dev_watchdog+0x0/0x129
> > [ 2630.016171]  [<c1032c35>] __do_softirq+0x88/0x10c
> > [ 2630.016177]  [<c1032ce8>] do_softirq+0x2f/0x47
> > [ 2630.016182]  [<c1032fa5>] irq_exit+0x38/0x75
> > [ 2630.016187]  [<c10159b8>] smp_apic_timer_interrupt+0x5f/0x6d
> > [ 2630.016195]  [<f80555e4>] ? pktgen_xmit+0xda2/0xe6f [pktgen]
> > [ 2630.016202]  [<c12c75a2>] apic_timer_interrupt+0x36/0x3c
> > [ 2630.016208]  [<f80555e4>] ? pktgen_xmit+0xda2/0xe6f [pktgen]
> > [ 2630.016214]  [<f80555e4>] ? pktgen_xmit+0xda2/0xe6f [pktgen]
> > [ 2630.016220]  [<c104007b>] ? __kfifo_peek_n+0x13/0x2b
> > [ 2630.016226]  [<c1032edd>] ? _local_bh_enable_ip+0x9d/0xb3
> > [ 2630.016231]  [<c1032efb>] local_bh_enable_ip+0x8/0xa
> > [ 2630.016236]  [<c12c6aaa>] _raw_spin_unlock_bh+0x2f/0x32
> > [ 2630.016242]  [<f80555e4>] pktgen_xmit+0xda2/0xe6f [pktgen]
> > [ 2630.016253]  [<f91707b8>] ? rtl8169_start_xmit+0x0/0x307 [r8169]
> > [ 2630.016261]  [<c11800d8>] ? plist_add+0x85/0x87
> > [ 2630.016267]  [<f80558ac>] ? pktgen_thread_worker+0x96/0x5c3 [pktgen]
> > [ 2630.016273]  [<c1029096>] ? get_parent_ip+0xb/0x31
> > [ 2630.016279]  [<c1029138>] ? sub_preempt_count+0x7c/0x89
> > [ 2630.016285]  [<f8055986>] pktgen_thread_worker+0x170/0x5c3 [pktgen]
> > [ 2630.016290]  [<c12c52bb>] ? schedule+0x535/0x545
> > [ 2630.016296]  [<c103fd96>] ? autoremove_wake_function+0x0/0x2f
> > [ 2630.016301]  [<c103fd96>] ? autoremove_wake_function+0x0/0x2f
> > [ 2630.016307]  [<f8055816>] ? pktgen_thread_worker+0x0/0x5c3 [pktgen]
> > [ 2630.016313]  [<c103faa6>] kthread+0x6a/0x6f
> > [ 2630.016318]  [<c103fa3c>] ? kthread+0x0/0x6f
> > [ 2630.016324]  [<c1002d82>] kernel_thread_helper+0x6/0x10
> > [ 2630.016328] ---[ end trace 0bdadff7f249bd0d ]---
> > [ 2630.032549] r8169 0000:02:00.0: eth0: link up
> > 
> > 
> > 	Sergey
>

Hello,
 
> r8169 driver misses a netif_carrier_off() call somewhere, probably when
> adapter is reset
> 
> Sergey, do you run a pktgen only test (sending packets, and not
> receiving packets during the test), or a mixed workload ?
>

Only sending. 

--
 PGDEV=/proc/net/pktgen/eth0
 echo "Configuring $PGDEV"
 pgset "$COUNT"
 pgset "$CLONE_SKB"
 pgset "$PKT_SIZE"
 pgset "$DELAY"
 pgset "dst 192.168.x.x"
 pgset "dst_mac  x:x:x:x:x:x"

 PGDEV=/proc/net/pktgen/pgctrl

 echo "Running..."
 pgset "start"
--

 
> r8169 is known to reset if the RX ring buffer is stressed too much.
> 

	Sergey

[-- Attachment #2: Type: application/pgp-signature, Size: 316 bytes --]

^ permalink raw reply

* [PATCH] [PATCH] s2io: fixing DBG_PRINT() macro
From: leitao @ 2010-07-26 15:07 UTC (permalink / raw)
  To: davem; +Cc: netdev, Breno Leitao

Patch 9e39f7c5b311a306977c5471f9e2ce4c456aa038 changed the
DBG_PRINT() macro and the if clause was wrongly changed. It means
that currently all the DBG_PRINT are being printed, flooding the
kernel log buffer with things like:

s2io: eth6: Next block at: c0000000b9c90000
s2io: eth6: In Neterion Tx routine

Signed-off-by: Breno Leitao <leitao@linux.vnet.ibm.com>
Acked-by: Sreenivasa Honnur <Sreenivasa.Honnur@neterion.com>
---
 drivers/net/s2io.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/s2io.h b/drivers/net/s2io.h
index 3645fb3..0af0335 100644
--- a/drivers/net/s2io.h
+++ b/drivers/net/s2io.h
@@ -65,7 +65,7 @@ static int debug_level = ERR_DBG;
 
 /* DEBUG message print. */
 #define DBG_PRINT(dbg_level, fmt, args...) do {			\
-	if (dbg_level >= debug_level)				\
+	if (dbg_level <= debug_level)				\
 		pr_info(fmt, ##args);				\
 	} while (0)
 
-- 
1.6.5.2


^ permalink raw reply related

* Re: [patch] usbnet: fix 100% CPU use on suspended device
From: Alan Stern @ 2010-07-26 15:13 UTC (permalink / raw)
  To: Elly Jones, Oliver Neukum; +Cc: David Miller, netdev, USB list
In-Reply-To: <AANLkTin_pduYCHVv_q4DSfSoQ7o4kFo43X8t1rXJ+7fa@mail.gmail.com>

On Mon, 26 Jul 2010, Elly Jones wrote:

> > This isn't right.  The problem should be fixed some other way.  Under
> > what circumstances are URBs submitted incorrectly?
> 
> When the device is autosuspended. What is the proper thing for a
> device to do here?

From looking at the code, it appears that the EVENT_DEV_ASLEEP flag
should be tested in usbnet_bh() the way it is in rx_submit().  But I'm
not an expert on usbnet; we should ask someone who is, like Oliver.

Alan Stern


^ permalink raw reply

* Re: [Bug #15704] [r8169] WARNING: at net/sched/sch_generic.c
From: Eric Dumazet @ 2010-07-26 15:20 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List,
	Maciej Rutecki, netdev
In-Reply-To: <20100726145742.GA4888-dY8u8AhHFaWtd10JCjopabkcH5ONE+aC@public.gmane.org>

Le lundi 26 juillet 2010 à 17:57 +0300, Sergey Senozhatsky a écrit :

> 
> Only sending. 
> 
> --
>  PGDEV=/proc/net/pktgen/eth0
>  echo "Configuring $PGDEV"
>  pgset "$COUNT"
>  pgset "$CLONE_SKB"
>  pgset "$PKT_SIZE"
>  pgset "$DELAY"
>  pgset "dst 192.168.x.x"
>  pgset "dst_mac  x:x:x:x:x:x"
> 
>  PGDEV=/proc/net/pktgen/pgctrl
> 
>  echo "Running..."
>  pgset "start"
> --
> 
>  

If you try with a DELAY=3000 (or 5000), what happens ?

Do you have TX pause enabled ? If yes, try to disable pauses.

# ethtool -a eth0
Pause parameters for eth0:
Autonegotiate:	on
RX:		on
TX:		on

# ethtool -A eth0 rx off tx off
# ethtool -a eth0
Pause parameters for eth0:
Autonegotiate:	on
RX:		off
TX:		off

^ permalink raw reply

* Re: [PATCH UPDATED 1/3] vhost: replace vhost_workqueue with per-vhost kthread
From: Michael S. Tsirkin @ 2010-07-26 15:25 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Oleg Nesterov, Sridhar Samudrala, netdev, lkml,
	kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev, Jiri Kosina,
	Thomas Gleixner, Ingo Molnar, Andi Kleen
In-Reply-To: <4C4BEAA2.6040301@kernel.org>

On Sun, Jul 25, 2010 at 09:41:22AM +0200, Tejun Heo wrote:
> Hello,
> 
> On 07/24/2010 09:14 PM, Michael S. Tsirkin wrote:
> >> I've created kthread_worker in wq#for-next tree and already converted
> >> ivtv to use it.  Once this lands in mainline, I think converting vhost
> >> to use it would be better choice.  kthread worker code uses basically
> >> the same logic used in the vhost_workqueue code but is better
> >> organized and documented.  So, I think it would be better to stick
> >> with the original implementation, as otherwise we're likely to just
> >> decrease test coverage without much gain.
> >>
> >>   http://git.kernel.org/?p=linux/kernel/git/tj/wq.git;a=commitdiff;h=b56c0d8937e665a27d90517ee7a746d0aa05af46;hp=53c5f5ba42c194cb13dd3083ed425f2c5b1ec439
> > 
> > Sure, if we keep using workqueue. But I'd like to investigate this
> > direction a bit more because there's discussion to switching from kthread to
> > regular threads altogether.
> 
> Hmmm? It doesn't have much to do with workqueue.  kthread_worker is a
> simple wrapper around kthread.  It now assumes kthread but changing it
> to be useable with any thread shouldn't be too hard.  Wouldn't that be
> better?

BTW, kthread_worker would benefit from the optimization I implemented
here as well.

-- 
MST

^ permalink raw reply

* Re: [PATCH UPDATED 1/3] vhost: replace vhost_workqueue with per-vhost kthread
From: Tejun Heo @ 2010-07-26 15:34 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Oleg Nesterov, Sridhar Samudrala, netdev, lkml,
	kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev, Jiri Kosina,
	Thomas Gleixner, Ingo Molnar, Andi Kleen
In-Reply-To: <20100726152510.GA26223@redhat.com>

Hello,

On 07/26/2010 05:25 PM, Michael S. Tsirkin wrote:
> BTW, kthread_worker would benefit from the optimization I implemented
> here as well.

Hmmm... I'm not quite sure whether it's an optimization.  I thought
the patch was due to feeling uncomfortable about using barriers?  Is
it an optimization?

Thanks.

-- 
tejun

^ permalink raw reply

* slub numa: Fix rare allocation from unexpected node
From: Christoph Lameter @ 2010-07-26 15:41 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: linux-mm, jamal, netdev

Subject: slub numa: Fix rare allocation from unexpected node

The network developers have seen sporadic allocations resulting in objects
coming from unexpected NUMA nodes despite asking for objects from a
specific node.

This is due to get_partial() calling get_any_partial() if partial
slabs are exhausted for a node even if a node was specified and therefore
one would expect allocations only from the specified node.

get_any_partial() sporadically may return a slab from a foreign
node to gradually reduce the size of partial lists on remote nodes
and thereby reduce total memory use for a slab cache.

The behavior is controlled by the remote_defrag_ratio of each cache.

Strictly speaking this is permitted behavior since __GFP_THISNODE was
not specified for the allocation but it is certain surprising.

This patch makes sure that the remote defrag behavior only occurs
if no node was specified.

Signed-off-by: Christoph Lameter <cl@linux-foundation.org>

---
 mm/slub.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2010-07-23 09:24:11.000000000 -0500
+++ linux-2.6/mm/slub.c	2010-07-23 09:25:15.000000000 -0500
@@ -1390,7 +1390,7 @@ static struct page *get_partial(struct k
 	int searchnode = (node == -1) ? numa_node_id() : node;

 	page = get_partial_node(get_node(s, searchnode));
-	if (page || (flags & __GFP_THISNODE))
+	if (page || node != -1)
 		return page;

 	return get_any_partial(s, flags);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH UPDATED 1/3] vhost: replace vhost_workqueue with per-vhost kthread
From: Tejun Heo @ 2010-07-26 15:46 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Oleg Nesterov, Sridhar Samudrala, netdev, lkml,
	kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev, Jiri Kosina,
	Thomas Gleixner, Ingo Molnar, Andi Kleen
In-Reply-To: <4C4DAB14.5050809@kernel.org>

On 07/26/2010 05:34 PM, Tejun Heo wrote:
> Hello,
> 
> On 07/26/2010 05:25 PM, Michael S. Tsirkin wrote:
>> BTW, kthread_worker would benefit from the optimization I implemented
>> here as well.
> 
> Hmmm... I'm not quite sure whether it's an optimization.  I thought
> the patch was due to feeling uncomfortable about using barriers?  Is
> it an optimization?

Yeah, one less smp_mb() in execution path.  The lock dancing in
flush() is ugly but then again mucking with barriers could be harder
to understand.  Care to send a patch against wq#for-next tree?

Thanks.

-- 
tejun

^ permalink raw reply

* Re: [PATCH] sky2: Fix most checkpatch errors
From: Stephen Hemminger @ 2010-07-26 15:46 UTC (permalink / raw)
  To: Mike McCormack; +Cc: netdev
In-Reply-To: <4C4D8D67.7080008@ring3k.org>

On Mon, 26 Jul 2010 22:28:07 +0900
Mike McCormack <mikem@ring3k.org> wrote:

> Tidy up whitespace and formatting to reduce from
>  38 errors to 4 and 20 fewer warnings.
> 
> Signed-off-by: Mike McCormack <mikem@ring3k.org>

Please don't bother fixing the stupid checkpatch whitespace noise.
Any patch that just substitutes space for tab (or vice versa) with
no human visible impact is just stupid.

But there are some useful tidbits in there.

--- a/drivers/net/sky2.c	2010-07-26 08:40:37.554612692 -0700
+++ b/drivers/net/sky2.c	2010-07-26 08:44:18.683011839 -0700
@@ -79,7 +79,7 @@
 
 #define SKY2_EEPROM_MAGIC	0x9955aabb
 
-#define RING_NEXT(x,s)	(((x)+1) & ((s)-1))
+#define RING_NEXT(x, s)	(((x)+1) & ((s)-1))
 
 static const u32 default_msg =
     NETIF_MSG_DRV | NETIF_MSG_PROBE | NETIF_MSG_LINK
@@ -172,7 +172,7 @@ static int gm_phy_write(struct sky2_hw *
 		udelay(10);
 	}
 
-	dev_warn(&hw->pdev->dev,"%s: phy write timeout\n", hw->dev[port]->name);
+	dev_warn(&hw->pdev->dev, "%s: phy write timeout\n", hw->dev[port]->name);
 	return -ETIMEDOUT;
 
 io_error:
@@ -1067,7 +1067,7 @@ static inline struct sky2_rx_le *sky2_ne
 	return le;
 }
 
-static unsigned sky2_get_rx_threshold(struct sky2_port* sky2)
+static unsigned sky2_get_rx_threshold(struct sky2_port *sky2)
 {
 	unsigned size;
 
@@ -1078,7 +1078,7 @@ static unsigned sky2_get_rx_threshold(st
 	return (size - 8) / sizeof(u32);
 }
 
-static unsigned sky2_get_rx_data_size(struct sky2_port* sky2)
+static unsigned sky2_get_rx_data_size(struct sky2_port *sky2)
 {
 	struct rx_ring_info *re;
 	unsigned size;
@@ -3014,7 +3014,7 @@ static int __devinit sky2_init(struct sk
 	hw->chip_id = sky2_read8(hw, B2_CHIP_ID);
 	hw->chip_rev = (sky2_read8(hw, B2_MAC_CFG) & CFG_CHIP_R_MSK) >> 4;
 
-	switch(hw->chip_id) {
+	switch (hw->chip_id) {
 	case CHIP_ID_YUKON_XL:
 		hw->flags = SKY2_HW_GIGABIT | SKY2_HW_NEWER_PHY;
 		if (hw->chip_rev < CHIP_REV_YU_XL_A2)
@@ -3685,7 +3685,7 @@ static int sky2_set_mac_address(struct n
 	return 0;
 }
 
-static void inline sky2_add_filter(u8 filter[8], const u8 *addr)
+static inline void sky2_add_filter(u8 filter[8], const u8 *addr)
 {
 	u32 bit;
 
@@ -3911,7 +3911,7 @@ static int sky2_set_coalesce(struct net_
 		return -EINVAL;
 	if (ecmd->rx_max_coalesced_frames > RX_MAX_PENDING)
 		return -EINVAL;
-	if (ecmd->rx_max_coalesced_frames_irq >RX_MAX_PENDING)
+	if (ecmd->rx_max_coalesced_frames_irq > RX_MAX_PENDING)
 		return -EINVAL;
 
 	if (ecmd->tx_coalesce_usecs == 0)
@@ -4372,7 +4372,7 @@ static int sky2_debug_show(struct seq_fi
 			seq_printf(seq, "%u:", idx);
 		sop = 0;
 
-		switch(le->opcode & ~HW_OWNER) {
+		switch (le->opcode & ~HW_OWNER) {
 		case OP_ADDR64:
 			seq_printf(seq, " %#x:", a);
 			break;
@@ -4441,7 +4441,7 @@ static int sky2_device_event(struct noti
 	if (dev->netdev_ops->ndo_open != sky2_up || !sky2_debug)
 		return NOTIFY_DONE;
 
-	switch(event) {
+	switch (event) {
 	case NETDEV_CHANGENAME:
 		if (sky2->debugfs) {
 			sky2->debugfs = debugfs_rename(sky2_debug, sky2->debugfs,
@@ -4636,7 +4636,7 @@ static int __devinit sky2_test_msi(struc
 	struct pci_dev *pdev = hw->pdev;
 	int err;
 
-	init_waitqueue_head (&hw->msi_wait);
+	init_waitqueue_head(&hw->msi_wait);
 
 	sky2_write32(hw, B0_IMSK, Y2_IS_IRQ_SW);
 

^ permalink raw reply

* Re: [PATCH UPDATED 1/3] vhost: replace vhost_workqueue with per-vhost kthread
From: Michael S. Tsirkin @ 2010-07-26 15:50 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Oleg Nesterov, Sridhar Samudrala, netdev, lkml,
	kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev, Jiri Kosina,
	Thomas Gleixner, Ingo Molnar, Andi Kleen
In-Reply-To: <4C4DAB14.5050809@kernel.org>

On Mon, Jul 26, 2010 at 05:34:44PM +0200, Tejun Heo wrote:
> Hello,
> 
> On 07/26/2010 05:25 PM, Michael S. Tsirkin wrote:
> > BTW, kthread_worker would benefit from the optimization I implemented
> > here as well.
> 
> Hmmm... I'm not quite sure whether it's an optimization.  I thought
> the patch was due to feeling uncomfortable about using barriers?

Oh yes. But getting rid of barriers is what motivated me originally.

>  Is it an optimization?
> 
> Thanks.

Yes, sure. This removes atomic read and 2 barrier operations on data path.  And
it does not add any new synchronization: instead, we reuse the lock that we
take anyway.  The relevant part is:


+               if (work) {
+                       __set_current_state(TASK_RUNNING);
+                       work->fn(work);
+               } else
+                       schedule();

-       if (work) {
-               __set_current_state(TASK_RUNNING);
-               work->fn(work);
-               smp_wmb();      /* wmb worker-b0 paired with flush-b1 */
-               work->done_seq = work->queue_seq;
-               smp_mb();       /* mb worker-b1 paired with flush-b0 */
-               if (atomic_read(&work->flushing))
-                       wake_up_all(&work->done);
-       } else
-               schedule();
-
-       goto repeat;

Is there a git tree with kthread_worker applied?
I might do this just for fun ...


> -- 
> tejun

^ permalink raw reply

* Re: [PATCH UPDATED 1/3] vhost: replace vhost_workqueue with per-vhost kthread
From: Michael S. Tsirkin @ 2010-07-26 15:51 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Oleg Nesterov, Sridhar Samudrala, netdev, lkml,
	kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev, Jiri Kosina,
	Thomas Gleixner, Ingo Molnar, Andi Kleen
In-Reply-To: <4C4DADD6.90507@kernel.org>

On Mon, Jul 26, 2010 at 05:46:30PM +0200, Tejun Heo wrote:
> On 07/26/2010 05:34 PM, Tejun Heo wrote:
> > Hello,
> > 
> > On 07/26/2010 05:25 PM, Michael S. Tsirkin wrote:
> >> BTW, kthread_worker would benefit from the optimization I implemented
> >> here as well.
> > 
> > Hmmm... I'm not quite sure whether it's an optimization.  I thought
> > the patch was due to feeling uncomfortable about using barriers?  Is
> > it an optimization?
> 
> Yeah, one less smp_mb() in execution path.  The lock dancing in
> flush() is ugly but then again mucking with barriers could be harder
> to understand.  Care to send a patch against wq#for-next tree?
> 
> Thanks.

Sure. Where's that, exactly?

> -- 
> tejun

^ permalink raw reply

* Re: [PATCH UPDATED 1/3] vhost: replace vhost_workqueue with per-vhost kthread
From: Tejun Heo @ 2010-07-26 16:05 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Oleg Nesterov, Sridhar Samudrala, netdev, lkml,
	kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev, Jiri Kosina,
	Thomas Gleixner, Ingo Molnar, Andi Kleen
In-Reply-To: <20100726155014.GA26412@redhat.com>

Hello,

On 07/26/2010 05:50 PM, Michael S. Tsirkin wrote:
>> Hmmm... I'm not quite sure whether it's an optimization.  I thought
>> the patch was due to feeling uncomfortable about using barriers?
> 
> Oh yes. But getting rid of barriers is what motivated me originally.

Yeah, getting rid of barriers is always good.  :-)

> Is there a git tree with kthread_worker applied?
> I might do this just for fun ...

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-next

For the original implementaiton, please take a look at commit
b56c0d8937e665a27d90517ee7a746d0aa05af46.

* Can you please keep the outer goto repeat loop?  I just don't like
  outermost for (;;).

* Placing try_to_freeze() could be a bit annoying.  It shouldn't be
  executed when there's a work to flush.

* I think A - B <= 0 test would be more familiar.  At least
  time_before/after() are implemented that way.

Thanks.

-- 
tejun

^ permalink raw reply

* RE: e1000e crashes with 2.6.34.x and ThinkPad T60
From: Allan, Bruce W @ 2010-07-26 16:13 UTC (permalink / raw)
  To: Marc Haber, Linux Kernel Developers,
	Linux Kernel Network Developers, "e1000-devel@lists.sourcef
In-Reply-To: <20100724092644.GA13353@torres.zugschlus.de>

On Saturday, July 24, 2010 2:27 AM, Marc Haber wrote:
> Hi,
> 
> I have a new notebook, a Thinkpad T60, which is freezing in random
> intervals (like 30 minutes to two days) as long as I am using the
> on-board wired ethernet interface, which is an e1000e, [8086:109a]. As
> long as I keep using the WLAN, the system runs for weeks despite
> frequent suspend/resume cycles etc. The crashes seem really to be tied
> to using the wired ethernet. This is a hard freeze, with nothing
> happening on the system, only a long push on the power button helps.
> 
> Additionally, sometimes, probably after suspend/resume, the wired
> ethernet does not come up properly again, ip addr claims "NO CARRIER"
> even if the LEDs on the interface and on the switch claim that there
> was a link. No packets are received by the interface when it's at this
> stage.
> 
> Both issues appear with 2.6.34 and 2.6.34.1. I didn't try any of these
> issues with an older kernel, 2.6.34 was already out when I started
> using the T60.
> 
> To rule out defective hardware, I have tried with a second T60, with
> the same results.
> 
> Full dmesg and lspci-nn attached, please say if you need more.
> 
> Greetings
> Marc
> 
> P.S.: Please Cc: me on replies, both linux-kernel and netdev are too
> big for me to timely follow. I am subscribed to both lists, but a Cc
> helps in getting a faster reply.
> 
> --
> -----------------------------------------------------------------------------
> Marc Haber         | "I don't trust Computers. They | Mailadresse im
> Header Mannheim, Germany  |  lose things."    Winona Ryder | Fon: *49
> 621 72739834 Nordisch by Nature |  How to make an American Quilt |
> Fax: *49 3221 2323190 

Adding e1000-devel (the Intel LAN developers list).

Please supply the full dmesg you meant to attach with the original report, as well as the output of lspci -vvv.

Thanks,
Bruce.

^ permalink raw reply

* Re: [PATCH UPDATED 1/3] vhost: replace vhost_workqueue with per-vhost kthread
From: Tejun Heo @ 2010-07-26 16:14 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Oleg Nesterov, Sridhar Samudrala, netdev, lkml,
	kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev, Jiri Kosina,
	Thomas Gleixner, Ingo Molnar, Andi Kleen
In-Reply-To: <4C4DB247.9060709@kernel.org>

Just one more thing.

On 07/26/2010 06:05 PM, Tejun Heo wrote:
> * Placing try_to_freeze() could be a bit annoying.  It shouldn't be
>   executed when there's a work to flush.

* Similar issue exists for kthread_stop().  The kthread shouldn't exit
  while there's a work to flush (please note that kthread_worker
  interface allows detaching / attaching worker kthread during
  operation, so it should remain in consistent state with regard to
  flushing).

Thanks.

-- 
tejun

^ permalink raw reply

* Re: [patch] usbnet: fix 100% CPU use on suspended device
From: Oliver Neukum @ 2010-07-26 16:21 UTC (permalink / raw)
  To: Alan Stern
  Cc: Elly Jones, David Miller, netdev-u79uwXL29TY76Z2rM5mHXA, USB list
In-Reply-To: <Pine.LNX.4.44L0.1007261106350.1550-100000-IYeN2dnnYyZXsRXLowluHWD2FQJk+8+b@public.gmane.org>

Am Montag, 26. Juli 2010, 17:13:23 schrieb Alan Stern:
> On Mon, 26 Jul 2010, Elly Jones wrote:
> 
> > > This isn't right.  The problem should be fixed some other way.  Under
> > > what circumstances are URBs submitted incorrectly?
> > 
> > When the device is autosuspended. What is the proper thing for a
> > device to do here?
> 
> From looking at the code, it appears that the EVENT_DEV_ASLEEP flag
> should be tested in usbnet_bh() the way it is in rx_submit().  But I'm
> not an expert on usbnet; we should ask someone who is, like Oliver.

Sorry, I didn't notice this thread.

The correct way to check for autosuspend in usbnet is to look
at EVENT_DEV_ASLEEP under txq.lock. That being said, usbnet_bh()
uses rx_submit() which does the correct check. The bug seems to be
a lack of error handling in usbnet_bh() regarding the return of rx_submit()

	Regards
		Oliver
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH UPDATED 1/3] vhost: replace vhost_workqueue with per-vhost kthread
From: Michael S. Tsirkin @ 2010-07-26 16:31 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Oleg Nesterov, Sridhar Samudrala, netdev, lkml,
	kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev, Jiri Kosina,
	Thomas Gleixner, Ingo Molnar, Andi Kleen
In-Reply-To: <4C4DB466.6000409@kernel.org>

On Mon, Jul 26, 2010 at 06:14:30PM +0200, Tejun Heo wrote:
> Just one more thing.
> 
> On 07/26/2010 06:05 PM, Tejun Heo wrote:
> > * Placing try_to_freeze() could be a bit annoying.  It shouldn't be
> >   executed when there's a work to flush.

BTW why is this important?
We could always get another work and flush right after
try_to_freeze, and then flush would block for a long time.


BTW the vhost patch you sent does not do this at all.
I am guessing it is because our thread is not freezable?

> * Similar issue exists for kthread_stop().  The kthread shouldn't exit
>   while there's a work to flush (please note that kthread_worker
>   interface allows detaching / attaching worker kthread during
>   operation, so it should remain in consistent state with regard to
>   flushing).
> 
> Thanks.

Not sure I agree here. Users must synchronise flush and stop calls.
Otherwise a work might get queued after stop is called, and
you won't be able to flush it.


> -- 
> tejun

^ permalink raw reply

* Re: [PATCH] ip: correctly report 802.15.4 link type
From: Stephen Hemminger @ 2010-07-26 16:35 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: stephen.hemminger, netdev
In-Reply-To: <1280089683-12020-2-git-send-email-jengelh@medozas.de>

On Sun, 25 Jul 2010 22:28:02 +0200
Jan Engelhardt <jengelh@medozas.de> wrote:

> Up until now, the "hardwpan" devices were displayed as link/[804].
> 
> Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
> ---
>  lib/ll_types.c |    3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
> 
> diff --git a/lib/ll_types.c b/lib/ll_types.c
> index 846cdb0..1cc46b6 100644
> --- a/lib/ll_types.c
> +++ b/lib/ll_types.c
> @@ -125,6 +125,9 @@ __PF(IEEE80211_PRISM,ieee802.11/prism)
>  #ifdef ARPHRD_IEEE80211_RADIOTAP
>  __PF(IEEE80211_RADIOTAP,ieee802.11/radiotap)
>  #endif
> +#ifdef ARPHRD_IEEE802154
> +__PF(IEEE802154, ieee802.15.4)
> +#endif
>  #ifdef ARPHRD_NONE
>  __PF(NONE, none)
>  #endif

Already fixed.
-- 
commit 4dbda0f482b8947d064b3f82992394033de6616c
Author: Stephen Hemminger <stephen.hemminger@vyatta.com>
Date:   Fri Jul 23 13:12:12 2010 -0700

    Update ARP header type table
    
    Add all current values. Since if_arp.h is included, get rid
    of ifdefs'. Make table constant.

^ permalink raw reply

* Re: [PATCH UPDATED 1/3] vhost: replace vhost_workqueue with per-vhost kthread
From: Michael S. Tsirkin @ 2010-07-26 16:51 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Oleg Nesterov, Sridhar Samudrala, netdev, lkml,
	kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev, Jiri Kosina,
	Thomas Gleixner, Ingo Molnar, Andi Kleen
In-Reply-To: <4C4DB466.6000409@kernel.org>

On Mon, Jul 26, 2010 at 06:14:30PM +0200, Tejun Heo wrote:
> Just one more thing.

I noticed that with vhost, flush_work was getting the worker
pointer as well. Can we live with this API change?

-- 
MST

^ permalink raw reply

* Re: [PATCH UPDATED 1/3] vhost: replace vhost_workqueue with per-vhost kthread
From: Michael S. Tsirkin @ 2010-07-26 16:57 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Oleg Nesterov, Sridhar Samudrala, netdev, lkml,
	kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev, Jiri Kosina,
	Thomas Gleixner, Ingo Molnar, Andi Kleen
In-Reply-To: <4C4DB466.6000409@kernel.org>

Here's an untested patch forward-ported from vhost
(works fine for vhost).

kthread_worker: replace barriers+atomics with a lock

We can save some cycles and make code simpler by
reusing worker lock for flush, instead of atomics.
flush_kthread_work needs to get worker pointer for
this to work.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

---

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index 685ea65..19ae9f2 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -58,7 +58,7 @@ struct kthread_work {
 	struct list_head	node;
 	kthread_work_func_t	func;
 	wait_queue_head_t	done;
-	atomic_t		flushing;
+	int			flushing;
 	int			queue_seq;
 	int			done_seq;
 };
@@ -72,7 +72,7 @@ struct kthread_work {
 	.node = LIST_HEAD_INIT((work).node),				\
 	.func = (fn),							\
 	.done = __WAIT_QUEUE_HEAD_INITIALIZER((work).done),		\
-	.flushing = ATOMIC_INIT(0),					\
+	.flushing = 0,							\
 	}
 
 #define DEFINE_KTHREAD_WORKER(worker)					\
@@ -96,7 +96,8 @@ int kthread_worker_fn(void *worker_ptr);
 
 bool queue_kthread_work(struct kthread_worker *worker,
 			struct kthread_work *work);
-void flush_kthread_work(struct kthread_work *work);
+void flush_kthread_work(struct kthread_worker *worker,
+			struct kthread_work *work);
 void flush_kthread_worker(struct kthread_worker *worker);
 
 #endif /* _LINUX_KTHREAD_H */
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 2dc3786..461f58d 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -283,10 +283,12 @@ int kthreadd(void *unused)
 int kthread_worker_fn(void *worker_ptr)
 {
 	struct kthread_worker *worker = worker_ptr;
-	struct kthread_work *work;
+	struct kthread_work *work = NULL;
 
+	spin_lock_irq(&worker->lock);
 	WARN_ON(worker->task);
 	worker->task = current;
+	spin_unlock_irq(&worker->lock);
 repeat:
 	set_current_state(TASK_INTERRUPTIBLE);	/* mb paired w/ kthread_stop */
 
@@ -298,23 +300,23 @@ repeat:
 		return 0;
 	}
 
-	work = NULL;
 	spin_lock_irq(&worker->lock);
+	if (work) {
+		work->done_seq = work->queue_seq;
+		if (work->flushing)
+			wake_up_all(&work->done);
+	}
 	if (!list_empty(&worker->work_list)) {
 		work = list_first_entry(&worker->work_list,
 					struct kthread_work, node);
 		list_del_init(&work->node);
-	}
+	} else
+		work = NULL;
 	spin_unlock_irq(&worker->lock);
 
 	if (work) {
 		__set_current_state(TASK_RUNNING);
 		work->func(work);
-		smp_wmb();	/* wmb worker-b0 paired with flush-b1 */
-		work->done_seq = work->queue_seq;
-		smp_mb();	/* mb worker-b1 paired with flush-b0 */
-		if (atomic_read(&work->flushing))
-			wake_up_all(&work->done);
 	} else if (!freezing(current))
 		schedule();
 
@@ -353,31 +355,33 @@ EXPORT_SYMBOL_GPL(queue_kthread_work);
 
 /**
  * flush_kthread_work - flush a kthread_work
+ * @worker: where work might be running
  * @work: work to flush
  *
  * If @work is queued or executing, wait for it to finish execution.
  */
-void flush_kthread_work(struct kthread_work *work)
+void flush_kthread_work(struct kthread_worker *worker,
+			struct kthread_work *work)
 {
-	int seq = work->queue_seq;
+	int seq
 
-	atomic_inc(&work->flushing);
-
-	/*
-	 * mb flush-b0 paired with worker-b1, to make sure either
-	 * worker sees the above increment or we see done_seq update.
-	 */
-	smp_mb__after_atomic_inc();
+	spin_lock_irq(&worker->lock);
+	seq = work->queue_seq;
+	++work->flushing;
+	spin_unlock_irq(&worker->lock);
 
 	/* A - B <= 0 tests whether B is in front of A regardless of overflow */
-	wait_event(work->done, seq - work->done_seq <= 0);
-	atomic_dec(&work->flushing);
-
-	/*
-	 * rmb flush-b1 paired with worker-b0, to make sure our caller
-	 * sees every change made by work->func().
-	 */
-	smp_mb__after_atomic_dec();
+	wait_event(work->done,
+		   ({
+			int done;
+			spin_lock_irq(&worker->lock);
+		    	delta = seq - work->done_seq <= 0;
+			spin_unlock_irq(&worker->lock);
+			done;
+		   });
+	spin_lock_irq(&worker->lock);
+	--work->flushing;
+	spin_unlock_irq(&worker->lock);
 }
 EXPORT_SYMBOL_GPL(flush_kthread_work);
 

^ permalink raw reply related

* Re: [PATCH repost] sched: export sched_set/getaffinity to modules
From: Michael S. Tsirkin @ 2010-07-26 17:12 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Peter Zijlstra, Sridhar Samudrala, Tejun Heo, Ingo Molnar, netdev,
	lkml, kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev,
	Jiri Kosina, Thomas Gleixner, Andi Kleen
In-Reply-To: <20100702210637.GA12433@redhat.com>

On Fri, Jul 02, 2010 at 11:06:37PM +0200, Oleg Nesterov wrote:
> On 07/02, Peter Zijlstra wrote:
> >
> > On Fri, 2010-07-02 at 11:01 -0700, Sridhar Samudrala wrote:
> > >
> > >  Does  it (Tejun's kthread_clone() patch) also  inherit the
> > > cgroup of the caller?
> >
> > Of course, its a simple do_fork() which inherits everything just as you
> > would expect from a similar sys_clone()/sys_fork() call.
> 
> Yes. And I'm afraid it can inherit more than we want. IIUC, this is called
> from ioctl(), right?
> 
> Then the new thread becomes the natural child of the caller, and it shares
> ->mm with the parent. And files, dup_fd() without CLONE_FS.
> 
> Signals. Say, if you send SIGKILL to this new thread, it can't sleep in
> TASK_INTERRUPTIBLE or KILLABLE after that. And this SIGKILL can be sent
> just because the parent gets SIGQUIT or abother coredumpable signal.
> Or the new thread can recieve SIGSTOP via ^Z.
> 
> Perhaps this is OK, I do not know. Just to remind that kernel_thread()
> is merely clone(CLONE_VM).
> 
> Oleg.

With some machinery to stop it later, yes.
Oleg, how does the below look to you?

Here I explicitly drop the fds so we don't share them.
CLONE_VM takes care of sharing the mm I think.
About signals - for the vhost-net use this is OK as we use
uninterruptible sleep anyway (like the new kthread_worker does).

This code seems to work fine for me so far - any comments?

---

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index aabc8a1..72c7b17 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -9,6 +9,11 @@ struct task_struct *kthread_create(int (*threadfn)(void *data),
 				   const char namefmt[], ...)
 	__attribute__((format(printf, 3, 4)));
 
+struct task_struct *kthread_create_inherit(int (*threadfn)(void *data),
+					   void *data,
+					   const char namefmt[], ...)
+	__attribute__((format(printf, 3, 4)));
+
 /**
  * kthread_run - create and wake a thread.
  * @threadfn: the function to run until signal_pending(current).
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 83911c7..b81588c 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -149,6 +149,38 @@ struct task_struct *kthread_create(int (*threadfn)(void *data),
 }
 EXPORT_SYMBOL(kthread_create);
 
+/* Same as kthread_create, but inherit attributes (cgroups, priority, CPU mask)
+ * from current. */
+struct task_struct *kthread_create_inherit(int (*threadfn)(void *data),
+					   void *data,
+					   const char namefmt[],
+					   ...)
+{
+	struct kthread_create_info create;
+
+	create.threadfn = threadfn;
+	create.data = data;
+	init_completion(&create.done);
+
+	create_kthread(&create);
+	wait_for_completion(&create.done);
+
+	if (!IS_ERR(create.result)) {
+		va_list args;
+
+		/* Don't share files with parent as drivers use release for
+		 * close on exit, etc. */
+		exit_files(create.result);
+
+		va_start(args, namefmt);
+		vsnprintf(create.result->comm, sizeof(create.result->comm),
+			  namefmt, args);
+		va_end(args);
+	}
+	return create.result;
+}
+EXPORT_SYMBOL(kthread_create_inherit);
+
 /**
  * kthread_bind - bind a just-created kthread to a cpu.
  * @p: thread created by kthread_create().

^ permalink raw reply related

* Re: [PATCH UPDATED 1/3] vhost: replace vhost_workqueue with per-vhost kthread
From: Michael S. Tsirkin @ 2010-07-26 16:23 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Oleg Nesterov, Sridhar Samudrala, netdev, lkml,
	kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev, Jiri Kosina,
	Thomas Gleixner, Ingo Molnar, Andi Kleen
In-Reply-To: <4C4DB247.9060709@kernel.org>

On Mon, Jul 26, 2010 at 06:05:27PM +0200, Tejun Heo wrote:
> Hello,
> 
> On 07/26/2010 05:50 PM, Michael S. Tsirkin wrote:
> >> Hmmm... I'm not quite sure whether it's an optimization.  I thought
> >> the patch was due to feeling uncomfortable about using barriers?
> > 
> > Oh yes. But getting rid of barriers is what motivated me originally.
> 
> Yeah, getting rid of barriers is always good.  :-)
> 
> > Is there a git tree with kthread_worker applied?
> > I might do this just for fun ...
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-next
> 
> For the original implementaiton, please take a look at commit
> b56c0d8937e665a27d90517ee7a746d0aa05af46.
> 
> * Can you please keep the outer goto repeat loop?  I just don't like
>   outermost for (;;).

Okay ... can we put the code in a {} scope to make it clear
where does the loop starts and ends?

> * Placing try_to_freeze() could be a bit annoying.  It shouldn't be
>   executed when there's a work to flush.

It currently seems to be executed when there is work to flush.
Is this wrong?

> * I think A - B <= 0 test would be more familiar.  At least
>   time_before/after() are implemented that way.

I am concerned that this overflows a signed integer -
which I seem to remeber that C99 disallows.
timer macros are on data path so might be worth the risk there,
but flush is slow path so better be safe?

> Thanks.
> 
> -- 
> tejun

^ permalink raw reply

* tcpdump <-> FCoE support
From: Loke, Chetan @ 2010-07-26 17:30 UTC (permalink / raw)
  To: devel-s9riP+hp16TNLxjTenLetw, netdev-u79uwXL29TY76Z2rM5mHXA

Folks,

13:08:21.907971 00:yy:yy:yy:yy:yy (oui Unknown) > 00:xx:xx:xx:xx:xx (oui
Unknown), ethertype Unknown (0x8906)

On: tcpdump-4.0.0-3, tcpdump doesn't seem to have FCoE support(unless I
capture it and pipe it to wireshark). Is there any effort currently in
progress on adding FCoE parsers?


Chetan Loke

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox