All of lore.kernel.org
 help / color / mirror / Atom feed
* [dm-crypt] [BUG] bad performance and system stalls when using dm-crypt
@ 2010-09-11  6:06 dave b
  2010-09-11  6:57 ` Heinz Diehl
  0 siblings, 1 reply; 15+ messages in thread
From: dave b @ 2010-09-11  6:06 UTC (permalink / raw)
  To: dm-crypt, bugme-daemon

I am not sure if this really is a bug or just expected behaviour with
ext3 in ordered mode with luks / dm-crypt adding some extra latency
...

----
I am experiencing really bad performance and a fair amount of system
stalls using dm-crypt(luks)  on debian lenny with  with an ext3
ordered mode  /

"
Version:       	1
Cipher name:   	aes
Cipher mode:   	cbc-essiv:sha256
Hash spec:     	sha1
"
/dev/mapper/foo-root on / type ext3 (rw,relatime,errors=remount-ro)
Swap is also within the luks partition.

The kernel version is 2.6.35.4.

A simple test is just to dd if=/dev/zero of=DELETEME for a short time
and the system will stall rather a lot - it is not a complete lock up
- just the entire system is unresponsive for large periods at a time
(from around 30seconds - to 2 minutes). This may be related to
http://thread.gmane.org/gmane.linux.kernel.mm/51444 .


The system is a 6 core amd phenom with 4gb of ram - the hard drive is
a WD 1TB sata 3 drive hooked up to a (sata 3 controller)
00:11.0 SATA controller: ATI Technologies Inc SB700/SB800 SATA
Controller [IDE mode] (rev 40)


00:11.0 SATA controller: ATI Technologies Inc SB700/SB800 SATA
Controller [IDE mode] (rev 40) (prog-if 01 [AHCI 1.0])
        Subsystem: ASUSTeK Computer Inc. Device 8443
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium
>TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 64, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 42
        Region 0: I/O ports at c000 [size=8]
        Region 1: I/O ports at b000 [size=4]
        Region 2: I/O ports at a000 [size=8]
        Region 3: I/O ports at 9000 [size=4]
        Region 4: I/O ports at 8000 [size=16]
        Region 5: Memory at fe8ffc00 (32-bit, non-prefetchable) [size=1K]
        Capabilities: [50] Message Signalled Interrupts: Mask- 64bit+
Queue=0/2 Enable+
                Address: 00000000fee3f00c  Data: 4161
        Capabilities: [70] SATA HBA <?>
        Capabilities: [a4] PCIe advanced features <?>
        Kernel driver in use: ahci
        Kernel modules: ahci

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [dm-crypt] [BUG] bad performance and system stalls when using dm-crypt
  2010-09-11  6:06 [dm-crypt] [BUG] bad performance and system stalls when using dm-crypt dave b
@ 2010-09-11  6:57 ` Heinz Diehl
  2010-09-11  8:11   ` dave b
  0 siblings, 1 reply; 15+ messages in thread
From: Heinz Diehl @ 2010-09-11  6:57 UTC (permalink / raw)
  To: dm-crypt

On 11.09.2010, dave b wrote: 

> A simple test is just to dd if=/dev/zero of=DELETEME for a short time
> and the system will stall rather a lot - it is not a complete lock up
> - just the entire system is unresponsive for large periods at a time
> (from around 30seconds - to 2 minutes).

This is most likely due to the massive disk i/o which is generated by the
dd command.

> This may be related to
> http://thread.gmane.org/gmane.linux.kernel.mm/51444 .

I think this is completely unrelated, it's more a kind of a disk scheduler
issue. I can't compare directly, because I'm running XFS on all of my
machines, but you could try to fine-tune your disk scheduler. In any case,
writing a big file and work on the same disk at the same time will always
give you, hmmm.. "some delay", even without using encryption.

Ok, let's give it a short run on one of my machines,
a 10 GB file zeroed out using the dd command you mentioned, on /home,
encrypted with dmcrypt/LUKS, and running Theodore Tso's fsync-tester
on the same partition:

htd@liesel:~/fs> ./fsync-tester
fsync time: 0.5354
fsync time: 0.1496
fsync time: 0.2655
fsync time: 5.1740
fsync time: 6.0126
fsync time: 3.7945
fsync time: 19.0478
fsync time: 11.2006
fsync time: 4.5590
fsync time: 0.1663
fsync time: 0.3466
fsync time: 2.2969
fsync time: 0.7246
fsync time: 0.3036
fsync time: 6.2057
fsync time: 0.1042
fsync time: 0.7644
fsync time: 0.9193
fsync time: 0.1738
fsync time: 0.7413
fsync time: 18.2807
fsync time: 1.1971
fsync time: 0.2808
fsync time: 0.2364
fsync time: 0.3220
fsync time: 0.2108
fsync time: 0.2831
fsync time: 2.6023
fsync time: 14.0767
fsync time: 23.4450
fsync time: 3.2213
fsync time: 4.3488
fsync time: 12.5578
fsync time: 0.2359
^C
htd@liesel:~/fs> 

You can see stalls up to 23 secs here, too. I'm using the latest kernel
2.6.36-rc3 from Linus' git repository, with the "global workqueue per cpu"
patch from Andi Kleen on top of it (which should not give any performance
boost here in this case). Scheduler is cfq, tuned this way:

echo cfq > /sys/block/sda/queue/scheduler
echo "1" > /sys/block/sda/queue/iosched/low_latency
echo "8" > /sys/block/sda/queue/iosched/slice_idle
echo "8" > /sys/block/sda/queue/iosched/quantum

liesel:/home/htd/fs # xfs_repair -V
xfs_repair version 2.10.1

mount: /dev/mapper/home on /home type xfs (rw,noatime,logbsize=256k,logbufs=2,nobarrier,inode64)


/*
 * fsync-tester.c
 *
 * Written by Theodore Ts'o, 3/21/09.
 *
 * This file may be redistributed under the terms of the GNU Public
 * License, version 2.
 */

#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <time.h>
#include <fcntl.h>
#include <string.h>

#define SIZE (32768*32)

static float timeval_subtract(struct timeval *tv1, struct timeval *tv2)
{
	return ((tv1->tv_sec - tv2->tv_sec) +
		((float) (tv1->tv_usec - tv2->tv_usec)) / 1000000);
}

int main(int argc, char **argv)
{
	int	fd;
	struct timeval tv, tv2;
	char buf[SIZE];

	fd = open("fsync-tester.tst-file", O_WRONLY|O_CREAT);
	if (fd < 0) {
		perror("open");
		exit(1);
	}
	memset(buf, 'a', SIZE);
	while (1) {
		pwrite(fd, buf, SIZE, 0);
		gettimeofday(&tv, NULL);
		fsync(fd);
		gettimeofday(&tv2, NULL);
		printf("fsync time: %5.4f\n", timeval_subtract(&tv2, &tv));
		sleep(1);
	}
}
	

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [dm-crypt] [BUG] bad performance and system stalls when using dm-crypt
  2010-09-11  6:57 ` Heinz Diehl
@ 2010-09-11  8:11   ` dave b
  2010-09-11  9:02     ` dave b
  0 siblings, 1 reply; 15+ messages in thread
From: dave b @ 2010-09-11  8:11 UTC (permalink / raw)
  To: dm-crypt

On 11 September 2010 16:57, Heinz Diehl <htd@fancy-poultry.org> wrote:
> On 11.09.2010, dave b wrote:
>
>> A simple test is just to dd if=/dev/zero of=DELETEME for a short time
>> and the system will stall rather a lot - it is not a complete lock up
>> - just the entire system is unresponsive for large periods at a time
>> (from around 30seconds - to 2 minutes).
>
> This is most likely due to the massive disk i/o which is generated by the
> dd command.

I don't this *this* should stall the entire system :)


>
>> This may be related to
>> http://thread.gmane.org/gmane.linux.kernel.mm/51444 .
>
> I think this is completely unrelated, it's more a kind of a disk scheduler
> issue. I can't compare directly, because I'm running XFS on all of my
> machines, but you could try to fine-tune your disk scheduler. In any case,
> writing a big file and work on the same disk at the same time will always
> give you, hmmm.. "some delay", even without using encryption.

Agreed. :)

>
> You can see stalls up to 23 secs here, too. I'm using the latest kernel
> 2.6.36-rc3 from Linus' git repository, with the "global workqueue per cpu"
> patch from Andi Kleen on top of it (which should not give any performance
> boost here in this case). Scheduler is cfq, tuned this way:

My cfq is the same ^^ (settings)
Well it isn't just dd - if you do a lot of grepping / find . etc. -
like rkhunter does then the system also noticeably stalls  :/
I do not think the test you attached is a good representation of the
*actual* behaviour.
What it looks like to me is that there is a total collapse of
scheduled reads vs writes for *all* programs (other than the process
causing the io work)  for a given period
When I test the deadline scheduler my system is slightly more usable
:) - noop is a bit iffy.


Without dding:
atop --> DSK |         sda | busy     64% | read   78040 | write
133104 | avio    3 ms |
hdparm -tT /dev/sda

/dev/sda:
 Timing cached reads:   6382 MB in  2.00 seconds = 3191.73 MB/sec
 Timing buffered disk reads:  382 MB in  3.02 seconds = 126.67 MB/sec

While dding: (dd if=/dev/zero of=TEMP  count=15553600)
atop --> DSK |         sda | busy     98% | read       5 | write
2057 | avio    4 ms |
(it spiked at 98% for extended periods  vs the one offs seen during
experimenting with noop and deadline)

-----------
Some of the output while dding using: dd if=/dev/zero of=TEMP  count=15553600


CFQ
  924   6.93s   0.00s     0K     0K     0K     0K  --   - R  68% kcryptd


fsync time: 0.0275
fsync time: 0.0232
fsync time: 0.0274
fsync time: 0.0215
fsync time: 0.7706
fsync time: 9.3548
fsync time: 14.4264
fsync time: 10.4625
fsync time: 12.5968
fsync time: 16.5984
fsync time: 15.4739
fsync time: 2.3007
fsync time: 0.0249
fsync time: 0.0513


Deadline:
fsync time: 0.1949
fsync time: 6.9050
fsync time: 14.3582
fsync time: 13.0077
fsync time: 11.6368
fsync time: 12.3486
fsync time: 5.1431
etc.


Really you can see that the deadline is 'worse' even though my system
is more usable than with the CFQ.

Lets switch to noop ;) (output for the entire dd run). the system is
still more usable than on the CFQ.
fsync time: 0.0243
fsync time: 0.0100
fsync time: 0.0186
fsync time: 1.1323
fsync time: 14.5452
fsync time: 18.4730
fsync time: 18.9720							
fsync time: 15.3721
rfsync time: 9.4640
fsync time: 1.6990
fsync time: 0.0179
fsync time: 0.0178
fsync time: 0.0477
fsync time: 0.0263
fsync time: 0.0200
fsync time: 0.0279

I think we need a better test :-)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [dm-crypt] [BUG] bad performance and system stalls when using dm-crypt
  2010-09-11  8:11   ` dave b
@ 2010-09-11  9:02     ` dave b
  2010-09-11  9:08       ` dave b
  0 siblings, 1 reply; 15+ messages in thread
From: dave b @ 2010-09-11  9:02 UTC (permalink / raw)
  To: dm-crypt

Here is what I would consider a better test, imho :)
/*
 * fsync-tester.c
 *
 * Written by Theodore Ts'o, 3/21/09.
 *
 * This file may be redistributed under the terms of the GNU Public
 * License, version 2.
 */

#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <time.h>
#include <fcntl.h>
#include <string.h>

#define SIZE (32768*32)

static float timeval_subtract(struct timeval *tv1, struct timeval *tv2)
{
	return ((tv1->tv_sec - tv2->tv_sec) +
		((float) (tv1->tv_usec - tv2->tv_usec)) / 1000000);
}

int main(int argc, char **argv)
{
	int fd;
	struct timeval tv, tv2, tv3, tv4;
	char buf[SIZE];

	fd = open("fsync-tester.tst-file", O_RDWR|O_CREAT);
	if (fd < 0) {
		perror("open");
		exit(1);
	}
	memset(buf, 'a', SIZE);
	while (1) {
		pwrite(fd, buf, SIZE, 0);
		gettimeofday(&tv, NULL);
		fsync(fd);
		gettimeofday(&tv2, NULL);
		printf("fsync time: %5.4f\n", timeval_subtract(&tv2, &tv));
		sleep(1);

		gettimeofday(&tv3, NULL);
		pread(fd, buf, SIZE, 0);
		gettimeofday(&tv4, NULL);
		printf("pread time: %5.4f\n", timeval_subtract(&tv4, &tv3));
		sleep(1);
	}
}

--- fsync-tester.c	2010-09-11 17:25:53.000000000 +1000
+++ mine.c	2010-09-11 18:57:36.000000000 +1000
@@ -27,10 +27,10 @@
 int main(int argc, char **argv)
 {
        int fd;
-       struct timeval tv, tv2;
+       struct timeval tv, tv2, tv3, tv4;
        char buf[SIZE];

-       fd = open("fsync-tester.tst-file", O_WRONLY|O_CREAT);
+       fd = open("fsync-tester.tst-file", O_RDWR|O_CREAT);
        if (fd < 0) {
                perror("open");
                exit(1);
@@ -43,6 +43,12 @@
                gettimeofday(&tv2, NULL);
                printf("fsync time: %5.4f\n", timeval_subtract(&tv2, &tv));
                sleep(1);
+
+               gettimeofday(&tv3, NULL);
+               pread(fd, buf, SIZE, 0);
+               gettimeofday(&tv4, NULL);
+	         printf("pread time: %5.4f\n", timeval_subtract(&tv4, &tv3));
+               sleep(1);
        }
 }

This still isn't as good a test as I would like :)

./mine.out
fsync time: 0.0264
pread time: 0.0009
fsync time: 0.0415
pread time: 0.0008
fsync time: 2.4119
pread time: 1.0698
fsync time: 19.5375
pread time: 0.0009
fsync time: 15.9053
pread time: 0.0009
fsync time: 6.3872
pread time: 3.6912
fsync time: 6.8910
pread time: 0.0099
fsync time: 4.0445
pread time: 0.3053
fsync time: 8.3816
pread time: 0.3386
fsync time: 6.0976
pread time: 0.0009
fsync time: 0.0236
pread time: 0.0008
fsync time: 0.0255

So when I try to do other things I see the pread time go way up :)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [dm-crypt] [BUG] bad performance and system stalls when using dm-crypt
  2010-09-11  9:02     ` dave b
@ 2010-09-11  9:08       ` dave b
  2010-09-12  8:36         ` dave b
  0 siblings, 1 reply; 15+ messages in thread
From: dave b @ 2010-09-11  9:08 UTC (permalink / raw)
  To: dm-crypt

> So when I try to do other things I see the pread time go way up :)
>

%s/go/goes
Sorry, what I meant was that when I try the dd and run this modified
tester, if I try to perform
another task the pread time output is significantly higher than the norm.
If I do not attempt to do other things (try to open new programs /
change windows) then the pread time seems to stay fairly constant. (as
I expected :) ).

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [dm-crypt] [BUG] bad performance and system stalls when using dm-crypt
  2010-09-11  9:08       ` dave b
@ 2010-09-12  8:36         ` dave b
  2010-09-12  9:04           ` Heinz Diehl
  2010-09-12  9:07           ` Milan Broz
  0 siblings, 2 replies; 15+ messages in thread
From: dave b @ 2010-09-12  8:36 UTC (permalink / raw)
  To: dm-crypt

On 11 September 2010 19:08, dave b <db.pub.mail@gmail.com> wrote:
>> So when I try to do other things I see the pread time go way up :)
>>
>
> %s/go/goes
> Sorry, what I meant was that when I try the dd and run this modified
> tester, if I try to perform
> another task the pread time output is significantly higher than the norm.
> If I do not attempt to do other things (try to open new programs /
> change windows) then the pread time seems to stay fairly constant. (as
> I expected :) ).
>

Should I forward my 'bug' to the linux kernel maliing list ?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [dm-crypt] [BUG] bad performance and system stalls when using dm-crypt
  2010-09-12  8:36         ` dave b
@ 2010-09-12  9:04           ` Heinz Diehl
  2010-09-12  9:07           ` Milan Broz
  1 sibling, 0 replies; 15+ messages in thread
From: Heinz Diehl @ 2010-09-12  9:04 UTC (permalink / raw)
  To: dm-crypt

On 12.09.2010, dave b wrote: 

> Should I forward my 'bug' to the linux kernel maliing list ?

I'm not shure at all what causes the behaviour you encounter, there is 
the device-mapper, the crypto layer, the disk scheduler and a lot more
which could be more or less related. 

There was a similar monster thread some months ago on latency issues
(which was not crypto related!), Linus did a "yum update" in the
background while reading mail with Alpine and encountered some severe
stalls. A lot of work has been put into cfq afterwards, and latency is a
still ongoing issue in the Linux kernel and on the lkml (see the patch
from M. Desnoyers which came up today regarding sched granularity).

In a nutshell: I don't know :-)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [dm-crypt] [BUG] bad performance and system stalls when using dm-crypt
  2010-09-12  8:36         ` dave b
  2010-09-12  9:04           ` Heinz Diehl
@ 2010-09-12  9:07           ` Milan Broz
  2010-09-12 10:42             ` dave b
  2010-09-12 10:45             ` dave b
  1 sibling, 2 replies; 15+ messages in thread
From: Milan Broz @ 2010-09-12  9:07 UTC (permalink / raw)
  To: dave b; +Cc: dm-crypt

On 09/12/2010 10:36 AM, dave b wrote:
> 
> Should I forward my 'bug' to the linux kernel maliing list ?

Better report it to kernel bugzilla, it is better for tracking.
(Also see https://bugzilla.kernel.org/show_bug.cgi?id=17892 )

Anyway, there are some patches waiting for inclusion in DM tree
for weeks and all fixes must follow these changes. Also replacing
io barriers in 2.6.37 can interfere here (fsync uses barriers).

Anyway, if you have some tests which you found useful for dm-crypt
testing, attach them to bugzilla too. I would like them to run
for all kernels in the future to avoid performance regressions.

Milan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [dm-crypt] [BUG] bad performance and system stalls when using dm-crypt
  2010-09-12  9:07           ` Milan Broz
@ 2010-09-12 10:42             ` dave b
  2010-09-12 10:45             ` dave b
  1 sibling, 0 replies; 15+ messages in thread
From: dave b @ 2010-09-12 10:42 UTC (permalink / raw)
  To: Milan Broz; +Cc: dm-crypt

On 12 September 2010 19:07, Milan Broz <mbroz@redhat.com> wrote:
> On 09/12/2010 10:36 AM, dave b wrote:
>>
>> Should I forward my 'bug' to the linux kernel maliing list ?
>
> Better report it to kernel bugzilla, it is better for tracking.
> (Also see https://bugzilla.kernel.org/show_bug.cgi?id=17892 )
>
> Anyway, there are some patches waiting for inclusion in DM tree
> for weeks and all fixes must follow these changes. Also replacing
> io barriers in 2.6.37 can interfere here (fsync uses barriers).
>
> Anyway, if you have some tests which you found useful for dm-crypt
> testing, attach them to bugzilla too. I would like them to run
> for all kernels in the future to avoid performance regressions.

Right, I see the issue as the following: requesting a lot of writes
and then a number of read operations from different processes leads to
a *poor* outcome.
I say this because if I do "dd /dev/zero of=/tmp/DELETEME"
after a short time the entire system stalls and it really is *very*
difficult to end the dd, which is writing to the disk :)


I found http://notemagnet.blogspot.com/2008/08/linux-write-cache-mystery.html
,http://lwn.net/Articles/152277/ , and
http://archives.postgresql.org/pgsql-performance/2007-08/msg00234.php8/msg00234.php
interesting.
I haven't found a tweakable for giving preference to reads over writes
for the CFQ, deadline seems to have such a tweakable.


There is only that one modified test I posted before  --> which also
tests *read* times as well as fsync times.

I will give you the 'story' as I see it (for 2 user types) :

Background:
	Given there is system with a 'fast' (hard drive based) permanent storage device


Feature:
	As a administrator of a file server
	I want to be able to write a large file to permanent storage
	So that I can profit via providing a 'fast' (hard drive based) file
sharing service!

Scenario: A user requests to store a large file on my service and then
5 other users request existing files (not in cache)
	Given user '0' requests to store a *large* file on my service
	When the system starts to write the data
	And user '1', '2', '3', '4' request files 'A', 'B', 'C', 'D'
	Then I should see the system able to honour the the requests
	And I should not see the system stall due to the large file being written	


Feature:
	As Desktop User
	I want to be able to use my desktop while I save a large file to
permanent storage
	So that I can profit as I can continue with my other tasks instead of
sitting on my hands!

Scenario: I want to copy a large file off a e-sata / usb drive to my hard drive
	Given I have a large file on my removable storage device
	When I copy the file to my hard drive
	And I try to open a new firefox window
	And I try to go to google.com
	And I try to open nautilus
	Then I should see the system responding to my requests within a
reasonable time frame
	And the system should not be completely stalled for more than a few seconds	


The other thing to note is that kcryptd was at 66% cpu time during
dd'ing files  <=10gb.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [dm-crypt] [BUG] bad performance and system stalls when using dm-crypt
  2010-09-12  9:07           ` Milan Broz
  2010-09-12 10:42             ` dave b
@ 2010-09-12 10:45             ` dave b
  2010-10-06 17:58               ` dave b
  1 sibling, 1 reply; 15+ messages in thread
From: dave b @ 2010-09-12 10:45 UTC (permalink / raw)
  To: Milan Broz; +Cc: dm-crypt

On 12 September 2010 19:07, Milan Broz <mbroz@redhat.com> wrote:
> On 09/12/2010 10:36 AM, dave b wrote:
>>
>> Should I forward my 'bug' to the linux kernel maliing list ?
>
> Better report it to kernel bugzilla, it is better for tracking.
> (Also see https://bugzilla.kernel.org/show_bug.cgi?id=17892 )

Will do :)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [dm-crypt] [BUG] bad performance and system stalls when using dm-crypt
  2010-09-12 10:45             ` dave b
@ 2010-10-06 17:58               ` dave b
  2010-10-24 15:32                 ` Fwd: " dave b
  0 siblings, 1 reply; 15+ messages in thread
From: dave b @ 2010-10-06 17:58 UTC (permalink / raw)
  To: Milan Broz; +Cc: dm-crypt

On 12 September 2010 21:45, dave b <db.pub.mail@gmail.com> wrote:
> On 12 September 2010 19:07, Milan Broz <mbroz@redhat.com> wrote:
>> On 09/12/2010 10:36 AM, dave b wrote:
>>>
>>> Should I forward my 'bug' to the linux kernel maliing list ?
>>
>> Better report it to kernel bugzilla, it is better for tracking.
>> (Also see https://bugzilla.kernel.org/show_bug.cgi?id=17892 )
>
> Will do :)
>

Ok no one has responded to that bug ... :/
On a system lock up I ssh into the box and saw wait time was 83% in top.
I also saw that kcrypd was using 23% of cpu time - and that only one
core was in use.
1012 root      20   0     0    0    0 S   11  0.0  30:15.46 kcryptd

The system was stalled for 20minutes ...

load average: 19.16, 15.74, 11.43

--
Let me take you a button-hole lower.		-- William Shakespeare, "Love's
Labour's Lost"

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Fwd: [dm-crypt] [BUG] bad performance and system stalls when using dm-crypt
  2010-10-06 17:58               ` dave b
@ 2010-10-24 15:32                 ` dave b
  2010-10-24 16:31                   ` Milan Broz
  0 siblings, 1 reply; 15+ messages in thread
From: dave b @ 2010-10-24 15:32 UTC (permalink / raw)
  To: Linux Kernel

I am forwarding this to the linux kernel mailing list to see if anyone
is actually interested with this bug or not. I suspect that the patch
to allow DM-CRYPT to use multiple cpus(Scale to multiple CPUs) will
remove most of the problem but *not* all of it. As I previously was
triggering the bug on a single core system.

See https://bugzilla.kernel.org/show_bug.cgi?id=18302 for the original thread.
I filed this as bug  18302 at
https://bugzilla.kernel.org/show_bug.cgi?id=18302 .



---------- Forwarded message ----------
From: dave b <db.pub.mail@gmail.com>
Date: 7 October 2010 04:58
Subject: Re: [dm-crypt] [BUG] bad performance and system stalls when
using dm-crypt
To: Milan Broz <mbroz@redhat.com>
Cc: dm-crypt@saout.de


On 12 September 2010 21:45, dave b <db.pub.mail@gmail.com> wrote:
> On 12 September 2010 19:07, Milan Broz <mbroz@redhat.com> wrote:
>> On 09/12/2010 10:36 AM, dave b wrote:
>>>
>>> Should I forward my 'bug' to the linux kernel maliing list ?
>>
>> Better report it to kernel bugzilla, it is better for tracking.
>> (Also see https://bugzilla.kernel.org/show_bug.cgi?id=17892 )
>
> Will do :)
>

Ok no one has responded to that bug ... :/
On a system lock up I ssh into the box and saw wait time was 83% in top.
I also saw that kcrypd was using 23% of cpu time - and that only one
core was in use.
1012 root      20   0     0    0    0 S   11  0.0  30:15.46 kcryptd

The system was stalled for 20minutes ...

load average: 19.16, 15.74, 11.43

--
Let me take you a button-hole lower.            -- William Shakespeare, "Love's
Labour's Lost"

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Fwd: [dm-crypt] [BUG] bad performance and system stalls when using dm-crypt
  2010-10-24 15:32                 ` Fwd: " dave b
@ 2010-10-24 16:31                   ` Milan Broz
  2010-10-24 16:52                     ` dave b
  0 siblings, 1 reply; 15+ messages in thread
From: Milan Broz @ 2010-10-24 16:31 UTC (permalink / raw)
  To: dave b; +Cc: Linux Kernel, device-mapper development

On 10/24/2010 05:32 PM, dave b wrote:
> I am forwarding this to the linux kernel mailing list to see if anyone
> is actually interested with this bug or not. I suspect that the patch
> to allow DM-CRYPT to use multiple cpus(Scale to multiple CPUs) will
> remove most of the problem but *not* all of it. As I previously was
> triggering the bug on a single core system.

Hi,
sorry for not updating the bug. We know about this.

Fix was expected to be based on top of the Andi's dm-crypt per-cpu
patch but unfortunately I found serious problems there
(see this thread http://lkml.org/lkml/2010/10/20/215 )

So there are now several known situations when dm-crypt doesn't
perform as expected (another problem just appeared when using CFQ,
(because dm(-crypt) lost the issuing process reference)
see http://lkml.org/lkml/2010/10/24/59).

Milan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Fwd: [dm-crypt] [BUG] bad performance and system stalls when using dm-crypt
  2010-10-24 16:31                   ` Milan Broz
@ 2010-10-24 16:52                     ` dave b
  2010-10-24 17:05                       ` Milan Broz
  0 siblings, 1 reply; 15+ messages in thread
From: dave b @ 2010-10-24 16:52 UTC (permalink / raw)
  To: Milan Broz; +Cc: Linux Kernel, device-mapper development

On 25 October 2010 03:31, Milan Broz <mbroz@redhat.com> wrote:
> On 10/24/2010 05:32 PM, dave b wrote:
>> I am forwarding this to the linux kernel mailing list to see if anyone
>> is actually interested with this bug or not. I suspect that the patch
>> to allow DM-CRYPT to use multiple cpus(Scale to multiple CPUs) will
>> remove most of the problem but *not* all of it. As I previously was
>> triggering the bug on a single core system.
>
> Hi,
> sorry for not updating the bug. We know about this.
>
> Fix was expected to be based on top of the Andi's dm-crypt per-cpu
> patch but unfortunately I found serious problems there
> (see this thread http://lkml.org/lkml/2010/10/20/215 )
>
> So there are now several known situations when dm-crypt doesn't
> perform as expected (another problem just appeared when using CFQ,
> (because dm(-crypt) lost the issuing process reference)
> see http://lkml.org/lkml/2010/10/24/59).

Thank for you replying  ^ ^
I was aware with the problems and I am following Andi's dm-crypt
per-cpu patch (I haven't applied or tested it). However, I didn't know
about http://lkml.org/lkml/2010/10/24/59 -->
So this may well be the root of the problem!
Also, won't http://lkml.org/lkml/2010/10/24/59  need to be altered
(again) to work with Andi's patch ?


--
There's small choice in rotten apples.		-- William Shakespeare, "The
Taming of the Shrew"

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Fwd: [dm-crypt] [BUG] bad performance and system stalls when using dm-crypt
  2010-10-24 16:52                     ` dave b
@ 2010-10-24 17:05                       ` Milan Broz
  0 siblings, 0 replies; 15+ messages in thread
From: Milan Broz @ 2010-10-24 17:05 UTC (permalink / raw)
  To: dave b; +Cc: Linux Kernel, device-mapper development

On 10/24/2010 06:52 PM, dave b wrote:
>> So there are now several known situations when dm-crypt doesn't
>> perform as expected (another problem just appeared when using CFQ,
>> (because dm(-crypt) lost the issuing process reference)
>> see http://lkml.org/lkml/2010/10/24/59).
> 
> Thank for you replying  ^ ^
> I was aware with the problems and I am following Andi's dm-crypt
> per-cpu patch (I haven't applied or tested it). However, I didn't know
> about http://lkml.org/lkml/2010/10/24/59 -->
> So this may well be the root of the problem!
> Also, won't http://lkml.org/lkml/2010/10/24/59  need to be altered
> (again) to work with Andi's patch ?

CFQ related problem should be solved - probably device-mapper core
have to provide some help. (It is not only about dm-crypt,
lot of situations when IO is submitted from different internal process.
The proposed patch is not the proper and complete way how to solve it.)

For the per-cpu patch - I am waiting for replay, apparently some
additional work there is needed.
But I would like to solve it ASAP.

Milan

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2010-10-24 17:05 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-09-11  6:06 [dm-crypt] [BUG] bad performance and system stalls when using dm-crypt dave b
2010-09-11  6:57 ` Heinz Diehl
2010-09-11  8:11   ` dave b
2010-09-11  9:02     ` dave b
2010-09-11  9:08       ` dave b
2010-09-12  8:36         ` dave b
2010-09-12  9:04           ` Heinz Diehl
2010-09-12  9:07           ` Milan Broz
2010-09-12 10:42             ` dave b
2010-09-12 10:45             ` dave b
2010-10-06 17:58               ` dave b
2010-10-24 15:32                 ` Fwd: " dave b
2010-10-24 16:31                   ` Milan Broz
2010-10-24 16:52                     ` dave b
2010-10-24 17:05                       ` Milan Broz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.