All of lore.kernel.org
 help / color / mirror / Atom feed
* BUG in scsi_lib.c due to a bad commit
@ 2014-11-11 23:33 Barto
  2014-11-12  0:17 ` Bjorn Helgaas
  0 siblings, 1 reply; 22+ messages in thread
From: Barto @ 2014-11-11 23:33 UTC (permalink / raw)
  To: linux-kernel

Hello everyone,

I notice a bug since kernel 3.17 ( and also with 3.18 branch ), a random
hang at boot on some PC configurations, I did a "git bisect" and I found
that the culprit is  :

[045065d8a300a37218c548e9aa7becd581c6a0e8] [SCSI] fix qemu boot hang problem

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=045065d8a300a37218c548e9aa7becd581c6a0e8

the author of this commit has choosen to inverse the logic of the if
statement in the file  drivers/scsi/scsi_lib.c in order to solve an
issue with qemu :

--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1774,7 +1774,7 @@ static void scsi_request_fn(struct request_queue *q)
blk_requeue_request(q, req);
atomic_dec(&sdev->device_busy);
out_delay:
- if (atomic_read(&sdev->device_busy) && !scsi_device_blocked(sdev))
+ if (!atomic_read(&sdev->device_busy) && !scsi_device_blocked(sdev))
blk_delay_queue(q, SCSI_QUEUE_DELAY);
}

this change triggers a bug on my PC ( I don't have SCSI devices, but
only 3 SATA harddisks and 2 IDE harddisks, SATA disks are on an ICH7
sata controler on the motherboard gigabyte GA-P31-DS3L, and IDE disk on
a JMicron JMB363/368 Sata/IDE PCIe card ), every 5~10 boots the boot
stops suddenly because of this commit,

If I revert this commit then the bug is gone, more details can be found
here, where I created a patch who reverts commit 045065d8 :

https://bugzilla.kernel.org/show_bug.cgi?id=87581

my question: why Guenter Roeck ( the author of the bad commit ) has
choosen to inverse the logic in the if statement ?
before his commit the if statement was like this :

if (atomic_read(&sdev->device_busy) && !scsi_device_blocked(sdev))
blk_delay_queue(q, SCSI_QUEUE_DELAY);

if his decision to inverse the logic of 
"atomic_read(&sdev->device_busy)" is acceptable then the real bug is
probably located elsewhere in the scsi source code, and we must solve
this mistery because there is obviously a bug regression in SCSI code
because with older kernels ( 3.16.7 and lower ) I don't have the random
hang boot bug with my configuration,

another user in archlinux forums has also this bug and he has a more
modern PC ( intel i7 core cpu, SSD device ), my fear is when linux
distros will move to kernel 3.17 then more users will have this weird
random bug who can occur only on boot and only with a specific PC
configuration, if the boot step is passed despite the random bug then
the bug will not occur, it occurs only during the boot process, which
probably means that the faulty source code is only called during the
boot process,

thanks for anyone who wants to dig this problem with me

^ permalink raw reply	[flat|nested] 22+ messages in thread
* Re: BUG in scsi_lib.c due to a bad commit
@ 2014-11-20  6:09 Christoph Hellwig
  2014-11-20 17:44 ` Barto
  0 siblings, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2014-11-20  6:09 UTC (permalink / raw)
  To: Barto
  Cc: Elliott, Robert (Server Storage), Guenter Roeck, linux-kernel,
	linux-scsi

I

Hi Barto,

sorry for the late reply, and thanks for drilling down the exact
conditions.

I think we have some issues with the lack of the host lock vs error
handling, but I still don't undertand the details.

I've got a test patch for you that just adds the host lock back in a few
places while keeing the atomic_t.  Can you try to reproduce with that
one?

When breaking an existing setup in software it's always the softwares
fault, btw..


diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 994eb08..99b77f7 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -311,16 +311,16 @@ void scsi_device_unbusy(struct scsi_device *sdev)
 	struct scsi_target *starget = scsi_target(sdev);
 	unsigned long flags;
 
+	spin_lock_irqsave(shost->host_lock, flags);
 	atomic_dec(&shost->host_busy);
 	if (starget->can_queue > 0)
 		atomic_dec(&starget->target_busy);
 
 	if (unlikely(scsi_host_in_recovery(shost) &&
 		     (shost->host_failed || shost->host_eh_scheduled))) {
-		spin_lock_irqsave(shost->host_lock, flags);
 		scsi_eh_wakeup(shost);
-		spin_unlock_irqrestore(shost->host_lock, flags);
 	}
+	spin_unlock_irqrestore(shost->host_lock, flags);
 
 	atomic_dec(&sdev->device_busy);
 }
@@ -1504,6 +1504,8 @@ static inline int scsi_host_queue_ready(struct request_queue *q,
 	if (scsi_host_in_recovery(shost))
 		return 0;
 
+	spin_lock_irq(shost->host_lock);
+
 	busy = atomic_inc_return(&shost->host_busy) - 1;
 	if (atomic_read(&shost->host_blocked) > 0) {
 		if (busy)
@@ -1527,21 +1529,19 @@ static inline int scsi_host_queue_ready(struct request_queue *q,
 
 	/* We're OK to process the command, so we can't be starved */
 	if (!list_empty(&sdev->starved_entry)) {
-		spin_lock_irq(shost->host_lock);
 		if (!list_empty(&sdev->starved_entry))
 			list_del_init(&sdev->starved_entry);
-		spin_unlock_irq(shost->host_lock);
 	}
 
+	spin_unlock_irq(shost->host_lock);
 	return 1;
 
 starved:
-	spin_lock_irq(shost->host_lock);
 	if (list_empty(&sdev->starved_entry))
 		list_add_tail(&sdev->starved_entry, &shost->starved_list);
-	spin_unlock_irq(shost->host_lock);
 out_dec:
 	atomic_dec(&shost->host_busy);
+	spin_unlock_irq(shost->host_lock);
 	return 0;
 }
 

^ permalink raw reply related	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2014-11-24 15:13 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-11 23:33 BUG in scsi_lib.c due to a bad commit Barto
2014-11-12  0:17 ` Bjorn Helgaas
2014-11-12  2:53   ` Guenter Roeck
2014-11-13  3:28     ` Barto
2014-11-13  5:33       ` Elliott, Robert (Server Storage)
2014-11-13  5:33         ` Elliott, Robert (Server Storage)
2014-11-13  9:38         ` Barto
2014-11-13 14:29           ` Christoph Hellwig
2014-11-13 15:13             ` Barto
2014-11-13 17:14             ` Barto
2014-11-13 17:54               ` Christoph Hellwig
2014-11-13 22:55                 ` Barto
2014-11-14  7:32                   ` Christoph Hellwig
2014-11-14 16:30                     ` Barto
2014-11-16 18:30                     ` Barto
2014-11-19 20:21                     ` Barto
  -- strict thread matches above, loose matches on Subject: below --
2014-11-20  6:09 Christoph Hellwig
2014-11-20 17:44 ` Barto
2014-11-20 17:53   ` Christoph Hellwig
2014-11-20 18:27     ` Barto
2014-11-24  9:18       ` Christoph Hellwig
2014-11-24 15:12         ` Barto

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.