* libata slab corruption saga
@ 2005-05-26 7:32 Denis Vlasenko
2005-05-26 7:47 ` Jeff Garzik
0 siblings, 1 reply; 3+ messages in thread
From: Denis Vlasenko @ 2005-05-26 7:32 UTC (permalink / raw)
To: Jeff Garzik; +Cc: linux-kernel
Hi Jeff,
Unfortunately it still happens even without IRQ sharing.
2005-05-25_14:54:01.79454 kern.err: ata1: command 0x35 timeout, stat 0x50 host_stat 0x1
2005-05-25_14:54:04.10684 kern.err: Slab corruption: start=c19d02fc, len=344
2005-05-25_14:54:04.10985 kern.err: Redzone: 0x5a2cf071/0x5a2cf071.
2005-05-25_14:54:04.10987 kern.err: Last user: [<c03b29f9>](scsi_put_command+0x49/0x80)
2005-05-25_14:54:04.10989 kern.err: 010: 6b 6b 6b 6b 6b 6b 6b 6b 08 0a 9d c1 6b 6b 6b 6b
It's 'use after free', someone seems to store 4-byte word into offset 0x18.
This word seems to be a kernel pointer (0xc19d0a08).
I may be mistaken, but I think it is a scsi_cmnd.eh_entry.next.
It seems that scsi_cmnd was freed (see below) and scsi_cmnd offset 0x18
is eh_entry:
struct list_head {
struct list_head *next, *prev;
};
struct scsi_cmnd {
int sc_magic;
struct scsi_device *device;
unsigned short state;
unsigned short owner;
struct scsi_request *sc_request;
struct list_head list; /* scsi_cmnd participates in queue lists */
struct list_head eh_entry; /* entry for the host eh_cmd_q */
2005-05-25_14:54:04.10991 kern.err: Prev obj: start=c19d0198, len=344
2005-05-25_14:54:04.10993 kern.err: Redzone: 0x5a2cf071/0x5a2cf071.
2005-05-25_14:54:04.10995 kern.err: Last user: [<c03b29f9>](scsi_put_command+0x49/0x80)
2005-05-25_14:54:04.10996 kern.err: 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
2005-05-25_14:54:04.10998 kern.err: 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
2005-05-25_14:54:04.11002 kern.err: Next obj: start=c19d0460, len=344
2005-05-25_14:54:04.11004 kern.err: Redzone: 0x5a2cf071/0x5a2cf071.
2005-05-25_14:54:04.11006 kern.err: Last user: [<c03b29f9>](scsi_put_command+0x49/0x80)
2005-05-25_14:54:04.11007 kern.err: 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
2005-05-25_14:54:04.11009 kern.err: 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
Looks like "Last user scsi_put_command+0x49" corresponds to list_empty(),
although asm output is a bit strange. Judge for yourself:
void scsi_put_command(struct scsi_cmnd *cmd)
{
struct scsi_device *sdev = cmd->device;
struct Scsi_Host *shost = sdev->host;
unsigned long flags;
/* serious error if the command hasn't come from a device list */
spin_lock_irqsave(&cmd->device->list_lock, flags);
BUG_ON(list_empty(&cmd->list));
list_del_init(&cmd->list);
spin_unlock(&cmd->device->list_lock);
/* changing locks here, don't need to restore the irq state */
spin_lock(&shost->free_list_lock);
asm("#0");
if (unlikely(list_empty(&shost->free_list))) { <==============
asm("#1");
list_add(&cmd->list, &shost->free_list);
cmd = NULL;
}
spin_unlock_irqrestore(&shost->free_list_lock, flags);
if (likely(cmd != NULL))
kmem_cache_free(shost->cmd_pool->slab, cmd);
put_device(&sdev->sdev_gendev);
}
Corresponding asm:
#APP
#0
#NO_APP
leal 20(%esi), %edx
movl 20(%esi), %eax
cmpl %edx, %eax
je .L132
.L127:
#APP
pushl %edi ; popfl
#NO_APP
testl %ebx, %ebx
je .L130
pushl %ebx
movl 16(%esi), %eax
movl (%eax), %ecx
pushl %ecx
call kmem_cache_free
popl %eax <========================== scsi_put_command+0x49
popl %edx
.L130:
movl -16(%ebp), %eax
addl $400, %eax
movl %eax, 8(%ebp)
leal -12(%ebp), %esp
popl %ebx
popl %esi
popl %edi
popl %ebp
jmp put_device
.L132:
#APP
#1
#NO_APP
Hope this helps.
--
vda
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: libata slab corruption saga
2005-05-26 7:32 libata slab corruption saga Denis Vlasenko
@ 2005-05-26 7:47 ` Jeff Garzik
2005-05-27 5:37 ` Denis Vlasenko
0 siblings, 1 reply; 3+ messages in thread
From: Jeff Garzik @ 2005-05-26 7:47 UTC (permalink / raw)
To: Denis Vlasenko; +Cc: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 49 bytes --]
Does the attached patch change things?
Jeff
[-- Attachment #2: atapi-fix-error-handling.patch --]
[-- Type: text/x-patch, Size: 4387 bytes --]
From: Hannes Reinecke <hare@suse.de>
Subject: Fix sata atapi error handling
References: 70918
SCSI commands which end up on the error handler need special attention;
we have to make sure that eh_cmd_q is properly emptied or scsi_eh will
try to forever finalize the command.
With this patch eh_cmd_q is explicitely emptied if not done so in the
strategy handler and a proper abort sequence is executed for each
command if required.
We rely on the strategy handler to fill out proper sense information for
us as SATA is 'special' when it comes to command sense gathering.
Signed-off-by: Kurt Garloff <garloff@suse.de>
Signed-off-by: Jens Axboe <axboe@suse.de>
Acked-by: Andreas Gruenbacher <agruen@suse.de>
Index: linux-2.6.11/drivers/scsi/libata-core.c
===================================================================
--- linux-2.6.11.orig/drivers/scsi/libata-core.c
+++ linux-2.6.11/drivers/scsi/libata-core.c
@@ -41,6 +41,7 @@
#include <scsi/scsi.h>
#include "scsi.h"
#include "scsi_priv.h"
+#include "scsi_logging.h"
#include <scsi/scsi_host.h>
#include <linux/libata.h>
#include <asm/io.h>
@@ -2587,6 +2588,11 @@ static void atapi_request_sense(struct a
DPRINTK("EXIT\n");
}
+void ata_qc_timeout_done(struct scsi_cmnd *scmd)
+{
+ return;
+}
+
/**
* ata_qc_timeout - Handle timeout of queued command
* @qc: Command that timed out
@@ -2618,17 +2624,16 @@ static void ata_qc_timeout(struct ata_qu
struct scsi_cmnd *cmd = qc->scsicmd;
if (!scsi_eh_eflags_chk(cmd, SCSI_EH_CANCEL_CMD)) {
-
/* finish completing original command */
+ qc->scsidone = ata_qc_timeout_done;
+
__ata_qc_complete(qc);
atapi_request_sense(ap, dev, cmd);
cmd->result = (CHECK_CONDITION << 1) | (DID_OK << 16);
- scsi_finish_command(cmd);
-
- goto out;
}
+ goto out;
}
/* hack alert! We cannot use the supplied completion
Index: linux-2.6.11/drivers/scsi/libata-scsi.c
===================================================================
--- linux-2.6.11.orig/drivers/scsi/libata-scsi.c
+++ linux-2.6.11/drivers/scsi/libata-scsi.c
@@ -633,12 +633,6 @@ int ata_scsi_error(struct Scsi_Host *hos
ap = (struct ata_port *) &host->hostdata[0];
ap->ops->eng_timeout(ap);
- /* TODO: this is per-command; when queueing is supported
- * this code will either change or move to a more
- * appropriate place
- */
- host->host_failed--;
-
DPRINTK("EXIT\n");
return 0;
}
Index: linux-2.6.11/drivers/scsi/scsi_error.c
===================================================================
--- linux-2.6.11.orig/drivers/scsi/scsi_error.c
+++ linux-2.6.11/drivers/scsi/scsi_error.c
@@ -1610,6 +1610,40 @@ static void scsi_unjam_host(struct Scsi_
scsi_eh_flush_done_q(&eh_done_q);
}
+static void scsi_invoke_strategy_handler(struct Scsi_Host *shost)
+{
+ int rtn;
+ struct list_head *lh, *lh_sf;
+ struct scsi_cmnd *scmd;
+ unsigned long flags;
+ LIST_HEAD(eh_work_q);
+ LIST_HEAD(eh_done_q);
+
+ rtn = shost->hostt->eh_strategy_handler(shost);
+
+ spin_lock_irqsave(shost->host_lock, flags);
+ list_splice_init(&shost->eh_cmd_q, &eh_work_q);
+ spin_unlock_irqrestore(shost->host_lock, flags);
+
+ SCSI_LOG_ERROR_RECOVERY(1, scsi_eh_prt_fail_stats(shost, &eh_work_q));
+
+ list_for_each_safe(lh, lh_sf, &eh_work_q) {
+ scmd = list_entry(lh, struct scsi_cmnd, eh_entry);
+
+ if (scsi_eh_eflags_chk(scmd, SCSI_EH_CANCEL_CMD) ||
+ !SCSI_SENSE_VALID(scmd))
+ continue;
+ scmd->retries = scmd->allowed;
+ scsi_eh_finish_cmd(scmd, &eh_done_q);
+ }
+
+ if (!list_empty(&eh_work_q))
+ if (!scsi_eh_abort_cmds(&eh_work_q, &eh_done_q))
+ scsi_eh_ready_devs(shost, &eh_work_q, &eh_done_q);
+
+ scsi_eh_flush_done_q(&eh_done_q);
+}
+
/**
* scsi_error_handler - Handle errors/timeouts of SCSI cmds.
* @data: Host for which we are running.
@@ -1624,7 +1658,6 @@ static void scsi_unjam_host(struct Scsi_
int scsi_error_handler(void *data)
{
struct Scsi_Host *shost = (struct Scsi_Host *) data;
- int rtn;
DECLARE_MUTEX_LOCKED(sem);
/*
@@ -1680,8 +1713,8 @@ int scsi_error_handler(void *data)
* what we need to do to get it up and online again (if we can).
* If we fail, we end up taking the thing offline.
*/
- if (shost->hostt->eh_strategy_handler)
- rtn = shost->hostt->eh_strategy_handler(shost);
+ if (shost->hostt->eh_strategy_handler)
+ scsi_invoke_strategy_handler(shost);
else
scsi_unjam_host(shost);
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: libata slab corruption saga
2005-05-26 7:47 ` Jeff Garzik
@ 2005-05-27 5:37 ` Denis Vlasenko
0 siblings, 0 replies; 3+ messages in thread
From: Denis Vlasenko @ 2005-05-27 5:37 UTC (permalink / raw)
To: Jeff Garzik; +Cc: linux-kernel
On Thursday 26 May 2005 10:47, Jeff Garzik wrote:
> Does the attached patch change things?
Yes. As soon as first ata error occurs:
22:01:59.006541500 kern.err: ata1: command 0x25 timeout, stat 0x50 host_stat 0x1
22:01:59.007252500 kern.alert: Unable to handle kernel paging request at virtual address 6b6b6b6b
22:01:59.008197500 kern.alert: printing eip:
22:01:59.009304500 kern.info: c03b5d7a
22:01:59.010231500 kern.alert: *pde = 00000000
22:01:59.010919500 kern.alert: Oops: 0000 [#1]
22:01:59.011948500 kern.info: Modules linked in: snd_pcm_oss snd_mixer_oss snd_intel8x0 snd_ac97_codec snd_pcm snd_timer snd soundcore snd_page_alloc ipt_REDIRECT ipt_MASQUERADE ipt_multiport ipt_state iptable_nat ip_conntrack iptable_filter cls_u32 sch_htb iptable_mangle autofs ip_tables
22:01:59.012521500 kern.info: CPU: 0
22:01:59.013618500 kern.info: EIP: 0060:[<c03b5d7a>] Not tainted VLI
22:01:59.014569500 kern.info: EFLAGS: 00010202 (2.6.12-rc2-cl)
22:01:59.015709500 kern.info: EIP is at scsi_try_to_abort_cmd+0xa/0x50
22:01:59.016749500 kern.info: eax: 6b6b6b6b ebx: c19d0460 ecx: c19d0460 edx: c19d0478
22:01:59.017802500 kern.info: esi: c19cffb4 edi: c19cffb4 ebp: c19cff84 esp: c19cff80
22:01:59.018791500 kern.info: ds: 007b es: 007b ss: 0068
22:01:59.019865500 kern.info: Process scsi_eh_0 (pid: 478, threadinfo=c19cf000 task=c1968530)
22:01:59.020918500 kern.info: Stack: c19d0460 c19cff9c c03b5eb7 c19d0460 c19d0478 c19cffb4 00000246 c19cffc8
22:01:59.021945500 kern.info: c03b6a92 c19cffb4 c19cffac c19cffac c19cffac c19d0478 c19d0478 c19cffd4
22:01:59.023051500 kern.info: dfc1507c 00000000 c19cffec c03b6b16 dfc1507c 00000000 00000000 c19cffdc
22:01:59.023907500 kern.info: Call Trace:
22:01:59.025073500 kern.info: [<c0103cc5>] show_stack+0x75/0x90
22:01:59.027613500 kern.info: [<c0103e19>] show_registers+0x119/0x190
22:01:59.028311500 kern.info: [<c0103ff5>] die+0xb5/0x130
22:01:59.029231500 kern.info: [<c0110728>] do_page_fault+0x458/0x6d6
22:01:59.030303500 kern.info: [<c01038ef>] error_code+0x4f/0x60
22:01:59.031213500 kern.info: [<c03b5eb7>] scsi_eh_abort_cmds+0x37/0x90
22:01:59.032241500 kern.info: [<c03b6a92>] scsi_invoke_strategy_handler+0xd2/0xf0
22:01:59.033137500 kern.info: [<c03b6b16>] scsi_error_handler+0x66/0xd0
22:01:59.033840500 kern.info: [<c0100cc5>] kernel_thread_helper+0x5/0x10
22:01:59.034854500 kern.info: Code: 00 00 00 5d c3 8b 43 38 89 43 34 8b 4d 0c 51 53 e8 1c ff ff ff 58 5a eb 88 90 8d b4 26 00 00 00 00 55 89 e5 8b 4d 08 53 8b 41 04 <8b> 00 8b 40 5c 8b 40 20 85 c0 ba 03 20 00 00 74 23 8b 59 2c 85
We hit use-after-free here:
static int scsi_try_to_abort_cmd(struct scsi_cmnd *scmd)
{
unsigned long flags;
int rtn = FAILED;
if (!scmd->device->host->hostt->eh_abort_handler)
return rtn;
Seems like struct scsi_cmnd pointed by scmd is filled by slab poisoning pattern:
000007d0 <scsi_try_to_abort_cmd>:
7d0: 55 push %ebp
7d1: 89 e5 mov %esp,%ebp
7d3: 8b 4d 08 mov 0x8(%ebp),%ecx ecx = scmd
7d6: 53 push %ebx
7d7: 8b 41 04 mov 0x4(%ecx),%eax eax = scmd->device (0x6b6b6b6b6)
7da: 8b 00 mov (%eax),%eax trying to get scmd->device->host, BOOM
7dc: 8b 40 5c mov 0x5c(%eax),%eax
7df: 8b 40 20 mov 0x20(%eax),%eax
7e2: 85 c0 test %eax,%eax
7e4: ba 03 20 00 00 mov $0x2003,%edx
7e9: 74 23 je 80e <scsi_try_to_abort_cmd+0x3e>
--
vda
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2005-05-27 6:07 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-05-26 7:32 libata slab corruption saga Denis Vlasenko
2005-05-26 7:47 ` Jeff Garzik
2005-05-27 5:37 ` Denis Vlasenko
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.