public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] firewire: net: fix panic in fwnet_write_complete
       [not found]         ` <4B547154.40700@s5r6.in-berlin.de>
@ 2010-01-18 21:36           ` Stefan Richter
  2010-01-19  0:42             ` Stefan Richter
  0 siblings, 1 reply; 5+ messages in thread
From: Stefan Richter @ 2010-01-18 21:36 UTC (permalink / raw)
  To: Илья Басин
  Cc: linux1394-devel, linux-kernel

Date: 
From: Stefan Richter <stefanr@s5r6.in-berlin.de>
Subject: firewire: net: fix panic in fwnet_write_complete

In the transmit path of firewire-net (IPv4 over 1394), the following
race condition may occur:
  - The networking soft IRQ inserts a datagram into the 1394 async
    request transmit DMA.
  - The 1394 async transmit completion tasklet runs to finish cleaning
    up (unlink datagram from list of pending ones, release skb and
    outbound 1394 transaction object) --- before the networking soft IRQ
    had a chance to proceed and add the datagram to the list of pending
    datagrams.

This caused a panic in the 1394 async transmit completion tasklet when
it dereferenced unitialized list heads:
http://bugzilla.kernel.org/show_bug.cgi?id=15077

The fix is to add checks in the tx soft RQ and in the tasklet to
determine who of these two is the last referrer to the transaction
object.  Then handle the cleanup of the object by the last referrer
rather than assuming that the tasklet is always the last one.

There is another similar race:  Between said tasklet and fwnet_close,
i.e. at ifdown.  However, that race is much less likely to occur in
practice and shall be fixed in a separate update.

Reported-by: Илья Басин <basinilya@gmail.com>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
---

Илья, could you give this a try?

 drivers/firewire/net.c |   53 ++++++++++++++++++++++++++++++-----------
 1 file changed, 39 insertions(+), 14 deletions(-)

Index: linux-2.6.32.2/drivers/firewire/net.c
===================================================================
--- linux-2.6.32.2.orig/drivers/firewire/net.c
+++ linux-2.6.32.2/drivers/firewire/net.c
@@ -893,20 +893,31 @@ static void fwnet_receive_broadcast(stru
 
 static struct kmem_cache *fwnet_packet_task_cache;
 
+static void fwnet_free_ptask(struct fwnet_packet_task *ptask)
+{
+	dev_kfree_skb_any(ptask->skb);
+	kmem_cache_free(fwnet_packet_task_cache, ptask);
+}
+
 static int fwnet_send_packet(struct fwnet_packet_task *ptask);
 
 static void fwnet_transmit_packet_done(struct fwnet_packet_task *ptask)
 {
-	struct fwnet_device *dev;
+	struct fwnet_device *dev = ptask->dev;
 	unsigned long flags;
-
-	dev = ptask->dev;
+	bool free;
 
 	spin_lock_irqsave(&dev->lock, flags);
-	list_del(&ptask->pt_link);
-	spin_unlock_irqrestore(&dev->lock, flags);
 
-	ptask->outstanding_pkts--; /* FIXME access inside lock */
+	ptask->outstanding_pkts--;
+
+	/* Check whether we or the networking TX soft-IRQ is last user. */
+	free = (ptask->outstanding_pkts == 0 && !list_empty(&ptask->pt_link));
+
+	if (ptask->outstanding_pkts == 0)
+		list_del(&ptask->pt_link);
+
+	spin_unlock_irqrestore(&dev->lock, flags);
 
 	if (ptask->outstanding_pkts > 0) {
 		u16 dg_size;
@@ -951,10 +962,10 @@ static void fwnet_transmit_packet_done(s
 			ptask->max_payload = skb->len + RFC2374_FRAG_HDR_SIZE;
 		}
 		fwnet_send_packet(ptask);
-	} else {
-		dev_kfree_skb_any(ptask->skb);
-		kmem_cache_free(fwnet_packet_task_cache, ptask);
 	}
+
+	if (free)
+		fwnet_free_ptask(ptask);
 }
 
 static void fwnet_write_complete(struct fw_card *card, int rcode,
@@ -977,6 +988,7 @@ static int fwnet_send_packet(struct fwne
 	unsigned tx_len;
 	struct rfc2734_header *bufhdr;
 	unsigned long flags;
+	bool free;
 
 	dev = ptask->dev;
 	tx_len = ptask->max_payload;
@@ -1022,12 +1034,16 @@ static int fwnet_send_packet(struct fwne
 				generation, SCODE_100, 0ULL, ptask->skb->data,
 				tx_len + 8, fwnet_write_complete, ptask);
 
-		/* FIXME race? */
 		spin_lock_irqsave(&dev->lock, flags);
-		list_add_tail(&ptask->pt_link, &dev->broadcasted_list);
+
+		/* If the AT tasklet already ran, we may be last user. */
+		free = (ptask->outstanding_pkts == 0 && list_empty(&ptask->pt_link));
+		if (!free)
+			list_add_tail(&ptask->pt_link, &dev->broadcasted_list);
+
 		spin_unlock_irqrestore(&dev->lock, flags);
 
-		return 0;
+		goto out;
 	}
 
 	fw_send_request(dev->card, &ptask->transaction,
@@ -1035,12 +1051,19 @@ static int fwnet_send_packet(struct fwne
 			ptask->generation, ptask->speed, ptask->fifo_addr,
 			ptask->skb->data, tx_len, fwnet_write_complete, ptask);
 
-	/* FIXME race? */
 	spin_lock_irqsave(&dev->lock, flags);
-	list_add_tail(&ptask->pt_link, &dev->sent_list);
+
+	/* If the AT tasklet already ran, we may be last user. */
+	free = (ptask->outstanding_pkts == 0 && list_empty(&ptask->pt_link));
+	if (!free)
+		list_add_tail(&ptask->pt_link, &dev->sent_list);
+
 	spin_unlock_irqrestore(&dev->lock, flags);
 
 	dev->netdev->trans_start = jiffies;
+ out:
+	if (free)
+		fwnet_free_ptask(ptask);
 
 	return 0;
 }
@@ -1298,6 +1321,8 @@ static netdev_tx_t fwnet_tx(struct sk_bu
 	spin_unlock_irqrestore(&dev->lock, flags);
 
 	ptask->max_payload = max_payload;
+	INIT_LIST_HEAD(&ptask->pt_link);
+
 	fwnet_send_packet(ptask);
 
 	return NETDEV_TX_OK;


-- 
Stefan Richter
-=====-==-=- ---= =--=-
http://arcgraph.de/sr/


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] firewire: net: fix panic in fwnet_write_complete
  2010-01-18 21:36           ` [PATCH] firewire: net: fix panic in fwnet_write_complete Stefan Richter
@ 2010-01-19  0:42             ` Stefan Richter
  2010-01-26 20:09               ` Stefan Richter
  0 siblings, 1 reply; 5+ messages in thread
From: Stefan Richter @ 2010-01-19  0:42 UTC (permalink / raw)
  To: Илья Басин
  Cc: linux1394-devel, linux-kernel

Stefan Richter wrote:
> The fix is to add checks in the tx soft RQ and in the tasklet to
> determine who of these two is the last referrer to the transaction
> object.  Then handle the cleanup of the object by the last referrer
> rather than assuming that the tasklet is always the last one.
...
> Илья, could you give this a try?

My own testing on a dual core box --- peered with another Linux box
which ran the older eth1394 --- worked OK so far for transfers of
massive files (> 4 GiB) back and forth via FTP and ssh running on a text
console.

But in my first attempt to use FTP on X11 --- i.e. with more concurrent
interrupt sources --- the firewire-net box crashed very soon.  In that
test I used Dolphin of KDE as FTP client, and the crash already happened
after Dolphin had loaded and displayed the remote home directory and was
peeking into files for preview data.  I got the following trace:

------------: cut here ]------------			
kernel: BUG at mm/slab.c:2885!  			
invalid: opcode: 0000 [#1]				
PREEMPT:						
SMP:							
DEBUG_PAGEALLOC:					
							
last: sysfs file:
/sys/devices/pci0000:00/0000:00:1e.0/0000:03:03.0/fw0/units								
Modules: linked in:					
firewire_net:						
firewire_ohci:  					
firewire_core:  					
netconsole:						
nfs:							
lockd:  						
sunrpc: 						
i915:							
drm_kms_helper: 					
drm:							
i2c_algo_bit:						
snd_hda_codec_idt:					
snd_hda_intel:  					
snd_hda_codec:  					
applesmc:						
led_class:						
input_polldev:  					
snd_pcm:						
rtc:							
i2c_i801:						
snd_timer:						
sg:							
sky2:							
snd:							
video:  						
backlight:						
snd_page_alloc: 					
thermal:						
output: 						
button: 						
							
							
Pid: 4267, comm: kio_thumbnail Not tainted 2.6.33-rc4 #2
Mac-F4208EC8/Macmini1,1							
EIP: 0060:[<c1080e4f>] EFLAGS: 00010006 CPU: 0  	
EIP: is at cache_free_debugcheck+0x1e8/0x2e8		
EAX: f8efddca EBX: f2008000 ECX: 65727661 EDX: 00c38e0a
ESI: d84156c5 EDI: f707fe40 EBP: f07fddc0 ESP: f07fdd80
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068		
Process: kio_thumbnail (pid: 4267, ti=f07fc000 task=f079b6e8
task.ti=f07fc000)  							
Stack:  						
f2008e20:						
00000000:						
0000000c:						
c11c6f96:						
00640100:						
f2008000:						
635688c0:						
d84156c5:						
							
kernel: 						
f2008ec0:						
00000082:						
f2008e18:						
00000000:						
00000000:						
f715ff30:						
f707fe40:						
f2008e20:						
							
kernel: 						
f07fddd8:						
c108137c:						
00000282:
f2008e20:
f2008e3c:
00000060:
f07fdde4:
c11c6f96:

Call: Trace:
? __kfree_skb+0x6e/0x71
? kmem_cache_free+0x56/0xb0
? __kfree_skb+0x6e/0x71
? kfree_skb+0x2b/0x2d
? unix_stream_recvmsg+0x3c3/0x48d
? file_read_actor+0x74/0xcc
? sock_aio_read+0xf2/0x107
? do_sync_read+0x89/0xc7


This was all that made it out via netconsole.  This was on 2.6.33-rc4
and I think I am going back to 2.6.32.y to eliminate unrelated .33-rc
flakiness from my testing.
-- 
Stefan Richter
-=====-==-=- ---= =--==
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] firewire: net: fix panic in fwnet_write_complete
  2010-01-19  0:42             ` Stefan Richter
@ 2010-01-26 20:09               ` Stefan Richter
  2010-01-31  9:45                 ` Re[2]: " Илья Басин
  0 siblings, 1 reply; 5+ messages in thread
From: Stefan Richter @ 2010-01-26 20:09 UTC (permalink / raw)
  To: Илья Басин
  Cc: linux1394-devel, linux-kernel

Stefan Richter wrote:
> My own testing on a dual core box --- peered with another Linux box
> which ran the older eth1394 --- worked OK so far for transfers of
> massive files (> 4 GiB) back and forth via FTP and ssh running on a text
> console.
> 
> But in my first attempt to use FTP on X11 --- i.e. with more concurrent
> interrupt sources --- the firewire-net box crashed very soon.  In that
> test I used Dolphin of KDE as FTP client, and the crash already happened
> after Dolphin had loaded and displayed the remote home directory and was
> peeking into files for preview data.  I got the following trace:
> 
> ------------: cut here ]------------			
> kernel: BUG at mm/slab.c:2885!  			
[...]
> EIP: is at cache_free_debugcheck+0x1e8/0x2e8		
[...]
> Call: Trace:
> ? __kfree_skb+0x6e/0x71
> ? kmem_cache_free+0x56/0xb0
> ? __kfree_skb+0x6e/0x71
> ? kfree_skb+0x2b/0x2d
> ? unix_stream_recvmsg+0x3c3/0x48d
> ? file_read_actor+0x74/0xcc
> ? sock_aio_read+0xf2/0x107
> ? do_sync_read+0x89/0xc7

Hi Илья,

I am going to send a pull request for some other unrelated firewire
fixes to Linus about tomorrow.

firewire-net on the other hand needs still more work than my
fwnet_write_complete patch since you and I now get these kmem cache
corruption related bugs.

What is your impression --- does this first incomplete fix decrease the
likelihood of crashes enough to make it worth to include it in a pull
request already?  I haven't done more extensive firewire-net tests since
last week yet, hence it is hard to tell for me how severe the new issue
is in practical use.

(Also, I have no idea yet whether I will be quick or slow to find this
other problem and whether it can be fixed in a manner that is suitable
for a mainline merge before 2.6.33 is going to be released.)
-- 
Stefan Richter
-=====-==-=- ---= ==-=-
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re[2]: [PATCH] firewire: net: fix panic in fwnet_write_complete
  2010-01-26 20:09               ` Stefan Richter
@ 2010-01-31  9:45                 ` Илья Басин
  2010-01-31 10:31                   ` Stefan Richter
  0 siblings, 1 reply; 5+ messages in thread
From: Илья Басин @ 2010-01-31  9:45 UTC (permalink / raw)
  To: Stefan Richter; +Cc: linux1394-devel, linux-kernel

SR> Stefan Richter wrote:
>> My own testing on a dual core box --- peered with another Linux box
>> which ran the older eth1394 --- worked OK so far for transfers of
>> massive files (> 4 GiB) back and forth via FTP and ssh running on a text
>> console.
>> 
>> But in my first attempt to use FTP on X11 --- i.e. with more concurrent
>> interrupt sources --- the firewire-net box crashed very soon.  In that
>> test I used Dolphin of KDE as FTP client, and the crash already happened
>> after Dolphin had loaded and displayed the remote home directory and was
>> peeking into files for preview data.  I got the following trace:
>> 
>> ------------: cut here ]------------                  
>> kernel: BUG at mm/slab.c:2885!                        
SR> [...]
>> EIP: is at cache_free_debugcheck+0x1e8/0x2e8          
SR> [...]
>> Call: Trace:
>> ? __kfree_skb+0x6e/0x71
>> ? kmem_cache_free+0x56/0xb0
>> ? __kfree_skb+0x6e/0x71
>> ? kfree_skb+0x2b/0x2d
>> ? unix_stream_recvmsg+0x3c3/0x48d
>> ? file_read_actor+0x74/0xcc
>> ? sock_aio_read+0xf2/0x107
>> ? do_sync_read+0x89/0xc7

SR> Hi Илья,

SR> I am going to send a pull request for some other unrelated firewire
SR> fixes to Linus about tomorrow.

SR> firewire-net on the other hand needs still more work than my
SR> fwnet_write_complete patch since you and I now get these kmem cache
SR> corruption related bugs.

SR> What is your impression --- does this first incomplete fix decrease the
SR> likelihood of crashes enough to make it worth to include it in a pull
SR> request already?  I haven't done more extensive firewire-net tests since
SR> last week yet, hence it is hard to tell for me how severe the new issue
SR> is in practical use.

SR> (Also, I have no idea yet whether I will be quick or slow to find this
SR> other problem and whether it can be fixed in a manner that is suitable
SR> for a mainline merge before 2.6.33 is going to be released.)

Hi. Have just found 4 letters from you in gmail spam bin, including
this one. Your fix does decrease the chance of crash. Without the fix
crashes randomly occured within 3 minutes. With the fix - only after
10 minutes of copying. These 2 bugs are probably unrelated.

-- 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] firewire: net: fix panic in fwnet_write_complete
  2010-01-31  9:45                 ` Re[2]: " Илья Басин
@ 2010-01-31 10:31                   ` Stefan Richter
  0 siblings, 0 replies; 5+ messages in thread
From: Stefan Richter @ 2010-01-31 10:31 UTC (permalink / raw)
  To: Илья Басин
  Cc: linux1394-devel, linux-kernel

Илья Басин wrote:
> SR> Stefan Richter wrote:
> SR> I am going to send a pull request for some other unrelated firewire
> SR> fixes to Linus about tomorrow.
[...]
> SR> What is your impression --- does this first incomplete fix decrease the
> SR> likelihood of crashes enough to make it worth to include it in a pull
> SR> request already?  I haven't done more extensive firewire-net tests since
> SR> last week yet, hence it is hard to tell for me how severe the new issue
> SR> is in practical use.
[...]
> Hi. Have just found 4 letters from you in gmail spam bin, including
> this one.

(One reason may be that my mail service provider occasionally gets
blacklisted in SORBS, perhaps other DNSBLs.  Or they distrust kyrillic
letters in mails from .de or whatever.)

> Your fix does decrease the chance of crash. Without the fix
> crashes randomly occured within 3 minutes. With the fix - only after
> 10 minutes of copying. These 2 bugs are probably unrelated.

OK.  The patch didn't make it into 2.6.33-rc6 from last Friday but I
will see to it that it gets merged before the final release.  I will
also try to find time to learn more about the other bug.
-- 
Stefan Richter
-=====-==-=- ---= =====
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-01-31 10:31 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <574312453.20100114002526@gmail.com>
     [not found] ` <4B4E4D73.5060100@s5r6.in-berlin.de>
     [not found]   ` <4B4E4FD7.9060403@s5r6.in-berlin.de>
     [not found]     ` <4B53291A.5050901@s5r6.in-berlin.de>
     [not found]       ` <1833218156.20100118151651@gmail.com>
     [not found]         ` <4B547154.40700@s5r6.in-berlin.de>
2010-01-18 21:36           ` [PATCH] firewire: net: fix panic in fwnet_write_complete Stefan Richter
2010-01-19  0:42             ` Stefan Richter
2010-01-26 20:09               ` Stefan Richter
2010-01-31  9:45                 ` Re[2]: " Илья Басин
2010-01-31 10:31                   ` Stefan Richter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox