* [dm-crypt] [BUG] bad performance and system stalls when using dm-crypt
@ 2010-09-11 6:06 dave b
2010-09-11 6:57 ` Heinz Diehl
0 siblings, 1 reply; 15+ messages in thread
From: dave b @ 2010-09-11 6:06 UTC (permalink / raw)
To: dm-crypt, bugme-daemon
I am not sure if this really is a bug or just expected behaviour with
ext3 in ordered mode with luks / dm-crypt adding some extra latency
...
----
I am experiencing really bad performance and a fair amount of system
stalls using dm-crypt(luks) on debian lenny with with an ext3
ordered mode /
"
Version: 1
Cipher name: aes
Cipher mode: cbc-essiv:sha256
Hash spec: sha1
"
/dev/mapper/foo-root on / type ext3 (rw,relatime,errors=remount-ro)
Swap is also within the luks partition.
The kernel version is 2.6.35.4.
A simple test is just to dd if=/dev/zero of=DELETEME for a short time
and the system will stall rather a lot - it is not a complete lock up
- just the entire system is unresponsive for large periods at a time
(from around 30seconds - to 2 minutes). This may be related to
http://thread.gmane.org/gmane.linux.kernel.mm/51444 .
The system is a 6 core amd phenom with 4gb of ram - the hard drive is
a WD 1TB sata 3 drive hooked up to a (sata 3 controller)
00:11.0 SATA controller: ATI Technologies Inc SB700/SB800 SATA
Controller [IDE mode] (rev 40)
00:11.0 SATA controller: ATI Technologies Inc SB700/SB800 SATA
Controller [IDE mode] (rev 40) (prog-if 01 [AHCI 1.0])
Subsystem: ASUSTeK Computer Inc. Device 8443
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium
>TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 64, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 42
Region 0: I/O ports at c000 [size=8]
Region 1: I/O ports at b000 [size=4]
Region 2: I/O ports at a000 [size=8]
Region 3: I/O ports at 9000 [size=4]
Region 4: I/O ports at 8000 [size=16]
Region 5: Memory at fe8ffc00 (32-bit, non-prefetchable) [size=1K]
Capabilities: [50] Message Signalled Interrupts: Mask- 64bit+
Queue=0/2 Enable+
Address: 00000000fee3f00c Data: 4161
Capabilities: [70] SATA HBA <?>
Capabilities: [a4] PCIe advanced features <?>
Kernel driver in use: ahci
Kernel modules: ahci
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [dm-crypt] [BUG] bad performance and system stalls when using dm-crypt 2010-09-11 6:06 [dm-crypt] [BUG] bad performance and system stalls when using dm-crypt dave b @ 2010-09-11 6:57 ` Heinz Diehl 2010-09-11 8:11 ` dave b 0 siblings, 1 reply; 15+ messages in thread From: Heinz Diehl @ 2010-09-11 6:57 UTC (permalink / raw) To: dm-crypt On 11.09.2010, dave b wrote: > A simple test is just to dd if=/dev/zero of=DELETEME for a short time > and the system will stall rather a lot - it is not a complete lock up > - just the entire system is unresponsive for large periods at a time > (from around 30seconds - to 2 minutes). This is most likely due to the massive disk i/o which is generated by the dd command. > This may be related to > http://thread.gmane.org/gmane.linux.kernel.mm/51444 . I think this is completely unrelated, it's more a kind of a disk scheduler issue. I can't compare directly, because I'm running XFS on all of my machines, but you could try to fine-tune your disk scheduler. In any case, writing a big file and work on the same disk at the same time will always give you, hmmm.. "some delay", even without using encryption. Ok, let's give it a short run on one of my machines, a 10 GB file zeroed out using the dd command you mentioned, on /home, encrypted with dmcrypt/LUKS, and running Theodore Tso's fsync-tester on the same partition: htd@liesel:~/fs> ./fsync-tester fsync time: 0.5354 fsync time: 0.1496 fsync time: 0.2655 fsync time: 5.1740 fsync time: 6.0126 fsync time: 3.7945 fsync time: 19.0478 fsync time: 11.2006 fsync time: 4.5590 fsync time: 0.1663 fsync time: 0.3466 fsync time: 2.2969 fsync time: 0.7246 fsync time: 0.3036 fsync time: 6.2057 fsync time: 0.1042 fsync time: 0.7644 fsync time: 0.9193 fsync time: 0.1738 fsync time: 0.7413 fsync time: 18.2807 fsync time: 1.1971 fsync time: 0.2808 fsync time: 0.2364 fsync time: 0.3220 fsync time: 0.2108 fsync time: 0.2831 fsync time: 2.6023 fsync time: 14.0767 fsync time: 23.4450 fsync time: 3.2213 fsync time: 4.3488 fsync time: 12.5578 fsync time: 0.2359 ^C htd@liesel:~/fs> You can see stalls up to 23 secs here, too. I'm using the latest kernel 2.6.36-rc3 from Linus' git repository, with the "global workqueue per cpu" patch from Andi Kleen on top of it (which should not give any performance boost here in this case). Scheduler is cfq, tuned this way: echo cfq > /sys/block/sda/queue/scheduler echo "1" > /sys/block/sda/queue/iosched/low_latency echo "8" > /sys/block/sda/queue/iosched/slice_idle echo "8" > /sys/block/sda/queue/iosched/quantum liesel:/home/htd/fs # xfs_repair -V xfs_repair version 2.10.1 mount: /dev/mapper/home on /home type xfs (rw,noatime,logbsize=256k,logbufs=2,nobarrier,inode64) /* * fsync-tester.c * * Written by Theodore Ts'o, 3/21/09. * * This file may be redistributed under the terms of the GNU Public * License, version 2. */ #include <unistd.h> #include <stdlib.h> #include <stdio.h> #include <sys/types.h> #include <sys/stat.h> #include <time.h> #include <fcntl.h> #include <string.h> #define SIZE (32768*32) static float timeval_subtract(struct timeval *tv1, struct timeval *tv2) { return ((tv1->tv_sec - tv2->tv_sec) + ((float) (tv1->tv_usec - tv2->tv_usec)) / 1000000); } int main(int argc, char **argv) { int fd; struct timeval tv, tv2; char buf[SIZE]; fd = open("fsync-tester.tst-file", O_WRONLY|O_CREAT); if (fd < 0) { perror("open"); exit(1); } memset(buf, 'a', SIZE); while (1) { pwrite(fd, buf, SIZE, 0); gettimeofday(&tv, NULL); fsync(fd); gettimeofday(&tv2, NULL); printf("fsync time: %5.4f\n", timeval_subtract(&tv2, &tv)); sleep(1); } } ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [dm-crypt] [BUG] bad performance and system stalls when using dm-crypt 2010-09-11 6:57 ` Heinz Diehl @ 2010-09-11 8:11 ` dave b 2010-09-11 9:02 ` dave b 0 siblings, 1 reply; 15+ messages in thread From: dave b @ 2010-09-11 8:11 UTC (permalink / raw) To: dm-crypt On 11 September 2010 16:57, Heinz Diehl <htd@fancy-poultry.org> wrote: > On 11.09.2010, dave b wrote: > >> A simple test is just to dd if=/dev/zero of=DELETEME for a short time >> and the system will stall rather a lot - it is not a complete lock up >> - just the entire system is unresponsive for large periods at a time >> (from around 30seconds - to 2 minutes). > > This is most likely due to the massive disk i/o which is generated by the > dd command. I don't this *this* should stall the entire system :) > >> This may be related to >> http://thread.gmane.org/gmane.linux.kernel.mm/51444 . > > I think this is completely unrelated, it's more a kind of a disk scheduler > issue. I can't compare directly, because I'm running XFS on all of my > machines, but you could try to fine-tune your disk scheduler. In any case, > writing a big file and work on the same disk at the same time will always > give you, hmmm.. "some delay", even without using encryption. Agreed. :) > > You can see stalls up to 23 secs here, too. I'm using the latest kernel > 2.6.36-rc3 from Linus' git repository, with the "global workqueue per cpu" > patch from Andi Kleen on top of it (which should not give any performance > boost here in this case). Scheduler is cfq, tuned this way: My cfq is the same ^^ (settings) Well it isn't just dd - if you do a lot of grepping / find . etc. - like rkhunter does then the system also noticeably stalls :/ I do not think the test you attached is a good representation of the *actual* behaviour. What it looks like to me is that there is a total collapse of scheduled reads vs writes for *all* programs (other than the process causing the io work) for a given period When I test the deadline scheduler my system is slightly more usable :) - noop is a bit iffy. Without dding: atop --> DSK | sda | busy 64% | read 78040 | write 133104 | avio 3 ms | hdparm -tT /dev/sda /dev/sda: Timing cached reads: 6382 MB in 2.00 seconds = 3191.73 MB/sec Timing buffered disk reads: 382 MB in 3.02 seconds = 126.67 MB/sec While dding: (dd if=/dev/zero of=TEMP count=15553600) atop --> DSK | sda | busy 98% | read 5 | write 2057 | avio 4 ms | (it spiked at 98% for extended periods vs the one offs seen during experimenting with noop and deadline) ----------- Some of the output while dding using: dd if=/dev/zero of=TEMP count=15553600 CFQ 924 6.93s 0.00s 0K 0K 0K 0K -- - R 68% kcryptd fsync time: 0.0275 fsync time: 0.0232 fsync time: 0.0274 fsync time: 0.0215 fsync time: 0.7706 fsync time: 9.3548 fsync time: 14.4264 fsync time: 10.4625 fsync time: 12.5968 fsync time: 16.5984 fsync time: 15.4739 fsync time: 2.3007 fsync time: 0.0249 fsync time: 0.0513 Deadline: fsync time: 0.1949 fsync time: 6.9050 fsync time: 14.3582 fsync time: 13.0077 fsync time: 11.6368 fsync time: 12.3486 fsync time: 5.1431 etc. Really you can see that the deadline is 'worse' even though my system is more usable than with the CFQ. Lets switch to noop ;) (output for the entire dd run). the system is still more usable than on the CFQ. fsync time: 0.0243 fsync time: 0.0100 fsync time: 0.0186 fsync time: 1.1323 fsync time: 14.5452 fsync time: 18.4730 fsync time: 18.9720 fsync time: 15.3721 rfsync time: 9.4640 fsync time: 1.6990 fsync time: 0.0179 fsync time: 0.0178 fsync time: 0.0477 fsync time: 0.0263 fsync time: 0.0200 fsync time: 0.0279 I think we need a better test :-) ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [dm-crypt] [BUG] bad performance and system stalls when using dm-crypt 2010-09-11 8:11 ` dave b @ 2010-09-11 9:02 ` dave b 2010-09-11 9:08 ` dave b 0 siblings, 1 reply; 15+ messages in thread From: dave b @ 2010-09-11 9:02 UTC (permalink / raw) To: dm-crypt Here is what I would consider a better test, imho :) /* * fsync-tester.c * * Written by Theodore Ts'o, 3/21/09. * * This file may be redistributed under the terms of the GNU Public * License, version 2. */ #include <unistd.h> #include <stdlib.h> #include <stdio.h> #include <sys/types.h> #include <sys/stat.h> #include <time.h> #include <fcntl.h> #include <string.h> #define SIZE (32768*32) static float timeval_subtract(struct timeval *tv1, struct timeval *tv2) { return ((tv1->tv_sec - tv2->tv_sec) + ((float) (tv1->tv_usec - tv2->tv_usec)) / 1000000); } int main(int argc, char **argv) { int fd; struct timeval tv, tv2, tv3, tv4; char buf[SIZE]; fd = open("fsync-tester.tst-file", O_RDWR|O_CREAT); if (fd < 0) { perror("open"); exit(1); } memset(buf, 'a', SIZE); while (1) { pwrite(fd, buf, SIZE, 0); gettimeofday(&tv, NULL); fsync(fd); gettimeofday(&tv2, NULL); printf("fsync time: %5.4f\n", timeval_subtract(&tv2, &tv)); sleep(1); gettimeofday(&tv3, NULL); pread(fd, buf, SIZE, 0); gettimeofday(&tv4, NULL); printf("pread time: %5.4f\n", timeval_subtract(&tv4, &tv3)); sleep(1); } } --- fsync-tester.c 2010-09-11 17:25:53.000000000 +1000 +++ mine.c 2010-09-11 18:57:36.000000000 +1000 @@ -27,10 +27,10 @@ int main(int argc, char **argv) { int fd; - struct timeval tv, tv2; + struct timeval tv, tv2, tv3, tv4; char buf[SIZE]; - fd = open("fsync-tester.tst-file", O_WRONLY|O_CREAT); + fd = open("fsync-tester.tst-file", O_RDWR|O_CREAT); if (fd < 0) { perror("open"); exit(1); @@ -43,6 +43,12 @@ gettimeofday(&tv2, NULL); printf("fsync time: %5.4f\n", timeval_subtract(&tv2, &tv)); sleep(1); + + gettimeofday(&tv3, NULL); + pread(fd, buf, SIZE, 0); + gettimeofday(&tv4, NULL); + printf("pread time: %5.4f\n", timeval_subtract(&tv4, &tv3)); + sleep(1); } } This still isn't as good a test as I would like :) ./mine.out fsync time: 0.0264 pread time: 0.0009 fsync time: 0.0415 pread time: 0.0008 fsync time: 2.4119 pread time: 1.0698 fsync time: 19.5375 pread time: 0.0009 fsync time: 15.9053 pread time: 0.0009 fsync time: 6.3872 pread time: 3.6912 fsync time: 6.8910 pread time: 0.0099 fsync time: 4.0445 pread time: 0.3053 fsync time: 8.3816 pread time: 0.3386 fsync time: 6.0976 pread time: 0.0009 fsync time: 0.0236 pread time: 0.0008 fsync time: 0.0255 So when I try to do other things I see the pread time go way up :) ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [dm-crypt] [BUG] bad performance and system stalls when using dm-crypt 2010-09-11 9:02 ` dave b @ 2010-09-11 9:08 ` dave b 2010-09-12 8:36 ` dave b 0 siblings, 1 reply; 15+ messages in thread From: dave b @ 2010-09-11 9:08 UTC (permalink / raw) To: dm-crypt > So when I try to do other things I see the pread time go way up :) > %s/go/goes Sorry, what I meant was that when I try the dd and run this modified tester, if I try to perform another task the pread time output is significantly higher than the norm. If I do not attempt to do other things (try to open new programs / change windows) then the pread time seems to stay fairly constant. (as I expected :) ). ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [dm-crypt] [BUG] bad performance and system stalls when using dm-crypt 2010-09-11 9:08 ` dave b @ 2010-09-12 8:36 ` dave b 2010-09-12 9:04 ` Heinz Diehl 2010-09-12 9:07 ` Milan Broz 0 siblings, 2 replies; 15+ messages in thread From: dave b @ 2010-09-12 8:36 UTC (permalink / raw) To: dm-crypt On 11 September 2010 19:08, dave b <db.pub.mail@gmail.com> wrote: >> So when I try to do other things I see the pread time go way up :) >> > > %s/go/goes > Sorry, what I meant was that when I try the dd and run this modified > tester, if I try to perform > another task the pread time output is significantly higher than the norm. > If I do not attempt to do other things (try to open new programs / > change windows) then the pread time seems to stay fairly constant. (as > I expected :) ). > Should I forward my 'bug' to the linux kernel maliing list ? ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [dm-crypt] [BUG] bad performance and system stalls when using dm-crypt 2010-09-12 8:36 ` dave b @ 2010-09-12 9:04 ` Heinz Diehl 2010-09-12 9:07 ` Milan Broz 1 sibling, 0 replies; 15+ messages in thread From: Heinz Diehl @ 2010-09-12 9:04 UTC (permalink / raw) To: dm-crypt On 12.09.2010, dave b wrote: > Should I forward my 'bug' to the linux kernel maliing list ? I'm not shure at all what causes the behaviour you encounter, there is the device-mapper, the crypto layer, the disk scheduler and a lot more which could be more or less related. There was a similar monster thread some months ago on latency issues (which was not crypto related!), Linus did a "yum update" in the background while reading mail with Alpine and encountered some severe stalls. A lot of work has been put into cfq afterwards, and latency is a still ongoing issue in the Linux kernel and on the lkml (see the patch from M. Desnoyers which came up today regarding sched granularity). In a nutshell: I don't know :-) ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [dm-crypt] [BUG] bad performance and system stalls when using dm-crypt 2010-09-12 8:36 ` dave b 2010-09-12 9:04 ` Heinz Diehl @ 2010-09-12 9:07 ` Milan Broz 2010-09-12 10:42 ` dave b 2010-09-12 10:45 ` dave b 1 sibling, 2 replies; 15+ messages in thread From: Milan Broz @ 2010-09-12 9:07 UTC (permalink / raw) To: dave b; +Cc: dm-crypt On 09/12/2010 10:36 AM, dave b wrote: > > Should I forward my 'bug' to the linux kernel maliing list ? Better report it to kernel bugzilla, it is better for tracking. (Also see https://bugzilla.kernel.org/show_bug.cgi?id=17892 ) Anyway, there are some patches waiting for inclusion in DM tree for weeks and all fixes must follow these changes. Also replacing io barriers in 2.6.37 can interfere here (fsync uses barriers). Anyway, if you have some tests which you found useful for dm-crypt testing, attach them to bugzilla too. I would like them to run for all kernels in the future to avoid performance regressions. Milan ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [dm-crypt] [BUG] bad performance and system stalls when using dm-crypt 2010-09-12 9:07 ` Milan Broz @ 2010-09-12 10:42 ` dave b 2010-09-12 10:45 ` dave b 1 sibling, 0 replies; 15+ messages in thread From: dave b @ 2010-09-12 10:42 UTC (permalink / raw) To: Milan Broz; +Cc: dm-crypt On 12 September 2010 19:07, Milan Broz <mbroz@redhat.com> wrote: > On 09/12/2010 10:36 AM, dave b wrote: >> >> Should I forward my 'bug' to the linux kernel maliing list ? > > Better report it to kernel bugzilla, it is better for tracking. > (Also see https://bugzilla.kernel.org/show_bug.cgi?id=17892 ) > > Anyway, there are some patches waiting for inclusion in DM tree > for weeks and all fixes must follow these changes. Also replacing > io barriers in 2.6.37 can interfere here (fsync uses barriers). > > Anyway, if you have some tests which you found useful for dm-crypt > testing, attach them to bugzilla too. I would like them to run > for all kernels in the future to avoid performance regressions. Right, I see the issue as the following: requesting a lot of writes and then a number of read operations from different processes leads to a *poor* outcome. I say this because if I do "dd /dev/zero of=/tmp/DELETEME" after a short time the entire system stalls and it really is *very* difficult to end the dd, which is writing to the disk :) I found http://notemagnet.blogspot.com/2008/08/linux-write-cache-mystery.html ,http://lwn.net/Articles/152277/ , and http://archives.postgresql.org/pgsql-performance/2007-08/msg00234.php8/msg00234.php interesting. I haven't found a tweakable for giving preference to reads over writes for the CFQ, deadline seems to have such a tweakable. There is only that one modified test I posted before --> which also tests *read* times as well as fsync times. I will give you the 'story' as I see it (for 2 user types) : Background: Given there is system with a 'fast' (hard drive based) permanent storage device Feature: As a administrator of a file server I want to be able to write a large file to permanent storage So that I can profit via providing a 'fast' (hard drive based) file sharing service! Scenario: A user requests to store a large file on my service and then 5 other users request existing files (not in cache) Given user '0' requests to store a *large* file on my service When the system starts to write the data And user '1', '2', '3', '4' request files 'A', 'B', 'C', 'D' Then I should see the system able to honour the the requests And I should not see the system stall due to the large file being written Feature: As Desktop User I want to be able to use my desktop while I save a large file to permanent storage So that I can profit as I can continue with my other tasks instead of sitting on my hands! Scenario: I want to copy a large file off a e-sata / usb drive to my hard drive Given I have a large file on my removable storage device When I copy the file to my hard drive And I try to open a new firefox window And I try to go to google.com And I try to open nautilus Then I should see the system responding to my requests within a reasonable time frame And the system should not be completely stalled for more than a few seconds The other thing to note is that kcryptd was at 66% cpu time during dd'ing files <=10gb. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [dm-crypt] [BUG] bad performance and system stalls when using dm-crypt 2010-09-12 9:07 ` Milan Broz 2010-09-12 10:42 ` dave b @ 2010-09-12 10:45 ` dave b 2010-10-06 17:58 ` dave b 1 sibling, 1 reply; 15+ messages in thread From: dave b @ 2010-09-12 10:45 UTC (permalink / raw) To: Milan Broz; +Cc: dm-crypt On 12 September 2010 19:07, Milan Broz <mbroz@redhat.com> wrote: > On 09/12/2010 10:36 AM, dave b wrote: >> >> Should I forward my 'bug' to the linux kernel maliing list ? > > Better report it to kernel bugzilla, it is better for tracking. > (Also see https://bugzilla.kernel.org/show_bug.cgi?id=17892 ) Will do :) ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [dm-crypt] [BUG] bad performance and system stalls when using dm-crypt 2010-09-12 10:45 ` dave b @ 2010-10-06 17:58 ` dave b 2010-10-24 15:32 ` Fwd: " dave b 0 siblings, 1 reply; 15+ messages in thread From: dave b @ 2010-10-06 17:58 UTC (permalink / raw) To: Milan Broz; +Cc: dm-crypt On 12 September 2010 21:45, dave b <db.pub.mail@gmail.com> wrote: > On 12 September 2010 19:07, Milan Broz <mbroz@redhat.com> wrote: >> On 09/12/2010 10:36 AM, dave b wrote: >>> >>> Should I forward my 'bug' to the linux kernel maliing list ? >> >> Better report it to kernel bugzilla, it is better for tracking. >> (Also see https://bugzilla.kernel.org/show_bug.cgi?id=17892 ) > > Will do :) > Ok no one has responded to that bug ... :/ On a system lock up I ssh into the box and saw wait time was 83% in top. I also saw that kcrypd was using 23% of cpu time - and that only one core was in use. 1012 root 20 0 0 0 0 S 11 0.0 30:15.46 kcryptd The system was stalled for 20minutes ... load average: 19.16, 15.74, 11.43 -- Let me take you a button-hole lower. -- William Shakespeare, "Love's Labour's Lost" ^ permalink raw reply [flat|nested] 15+ messages in thread
* Fwd: [dm-crypt] [BUG] bad performance and system stalls when using dm-crypt 2010-10-06 17:58 ` dave b @ 2010-10-24 15:32 ` dave b 2010-10-24 16:31 ` Milan Broz 0 siblings, 1 reply; 15+ messages in thread From: dave b @ 2010-10-24 15:32 UTC (permalink / raw) To: Linux Kernel I am forwarding this to the linux kernel mailing list to see if anyone is actually interested with this bug or not. I suspect that the patch to allow DM-CRYPT to use multiple cpus(Scale to multiple CPUs) will remove most of the problem but *not* all of it. As I previously was triggering the bug on a single core system. See https://bugzilla.kernel.org/show_bug.cgi?id=18302 for the original thread. I filed this as bug 18302 at https://bugzilla.kernel.org/show_bug.cgi?id=18302 . ---------- Forwarded message ---------- From: dave b <db.pub.mail@gmail.com> Date: 7 October 2010 04:58 Subject: Re: [dm-crypt] [BUG] bad performance and system stalls when using dm-crypt To: Milan Broz <mbroz@redhat.com> Cc: dm-crypt@saout.de On 12 September 2010 21:45, dave b <db.pub.mail@gmail.com> wrote: > On 12 September 2010 19:07, Milan Broz <mbroz@redhat.com> wrote: >> On 09/12/2010 10:36 AM, dave b wrote: >>> >>> Should I forward my 'bug' to the linux kernel maliing list ? >> >> Better report it to kernel bugzilla, it is better for tracking. >> (Also see https://bugzilla.kernel.org/show_bug.cgi?id=17892 ) > > Will do :) > Ok no one has responded to that bug ... :/ On a system lock up I ssh into the box and saw wait time was 83% in top. I also saw that kcrypd was using 23% of cpu time - and that only one core was in use. 1012 root 20 0 0 0 0 S 11 0.0 30:15.46 kcryptd The system was stalled for 20minutes ... load average: 19.16, 15.74, 11.43 -- Let me take you a button-hole lower. -- William Shakespeare, "Love's Labour's Lost" ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Fwd: [dm-crypt] [BUG] bad performance and system stalls when using dm-crypt 2010-10-24 15:32 ` Fwd: " dave b @ 2010-10-24 16:31 ` Milan Broz 2010-10-24 16:52 ` dave b 0 siblings, 1 reply; 15+ messages in thread From: Milan Broz @ 2010-10-24 16:31 UTC (permalink / raw) To: dave b; +Cc: Linux Kernel, device-mapper development On 10/24/2010 05:32 PM, dave b wrote: > I am forwarding this to the linux kernel mailing list to see if anyone > is actually interested with this bug or not. I suspect that the patch > to allow DM-CRYPT to use multiple cpus(Scale to multiple CPUs) will > remove most of the problem but *not* all of it. As I previously was > triggering the bug on a single core system. Hi, sorry for not updating the bug. We know about this. Fix was expected to be based on top of the Andi's dm-crypt per-cpu patch but unfortunately I found serious problems there (see this thread http://lkml.org/lkml/2010/10/20/215 ) So there are now several known situations when dm-crypt doesn't perform as expected (another problem just appeared when using CFQ, (because dm(-crypt) lost the issuing process reference) see http://lkml.org/lkml/2010/10/24/59). Milan ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Fwd: [dm-crypt] [BUG] bad performance and system stalls when using dm-crypt 2010-10-24 16:31 ` Milan Broz @ 2010-10-24 16:52 ` dave b 2010-10-24 17:05 ` Milan Broz 0 siblings, 1 reply; 15+ messages in thread From: dave b @ 2010-10-24 16:52 UTC (permalink / raw) To: Milan Broz; +Cc: Linux Kernel, device-mapper development On 25 October 2010 03:31, Milan Broz <mbroz@redhat.com> wrote: > On 10/24/2010 05:32 PM, dave b wrote: >> I am forwarding this to the linux kernel mailing list to see if anyone >> is actually interested with this bug or not. I suspect that the patch >> to allow DM-CRYPT to use multiple cpus(Scale to multiple CPUs) will >> remove most of the problem but *not* all of it. As I previously was >> triggering the bug on a single core system. > > Hi, > sorry for not updating the bug. We know about this. > > Fix was expected to be based on top of the Andi's dm-crypt per-cpu > patch but unfortunately I found serious problems there > (see this thread http://lkml.org/lkml/2010/10/20/215 ) > > So there are now several known situations when dm-crypt doesn't > perform as expected (another problem just appeared when using CFQ, > (because dm(-crypt) lost the issuing process reference) > see http://lkml.org/lkml/2010/10/24/59). Thank for you replying ^ ^ I was aware with the problems and I am following Andi's dm-crypt per-cpu patch (I haven't applied or tested it). However, I didn't know about http://lkml.org/lkml/2010/10/24/59 --> So this may well be the root of the problem! Also, won't http://lkml.org/lkml/2010/10/24/59 need to be altered (again) to work with Andi's patch ? -- There's small choice in rotten apples. -- William Shakespeare, "The Taming of the Shrew" ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Fwd: [dm-crypt] [BUG] bad performance and system stalls when using dm-crypt 2010-10-24 16:52 ` dave b @ 2010-10-24 17:05 ` Milan Broz 0 siblings, 0 replies; 15+ messages in thread From: Milan Broz @ 2010-10-24 17:05 UTC (permalink / raw) To: dave b; +Cc: Linux Kernel, device-mapper development On 10/24/2010 06:52 PM, dave b wrote: >> So there are now several known situations when dm-crypt doesn't >> perform as expected (another problem just appeared when using CFQ, >> (because dm(-crypt) lost the issuing process reference) >> see http://lkml.org/lkml/2010/10/24/59). > > Thank for you replying ^ ^ > I was aware with the problems and I am following Andi's dm-crypt > per-cpu patch (I haven't applied or tested it). However, I didn't know > about http://lkml.org/lkml/2010/10/24/59 --> > So this may well be the root of the problem! > Also, won't http://lkml.org/lkml/2010/10/24/59 need to be altered > (again) to work with Andi's patch ? CFQ related problem should be solved - probably device-mapper core have to provide some help. (It is not only about dm-crypt, lot of situations when IO is submitted from different internal process. The proposed patch is not the proper and complete way how to solve it.) For the per-cpu patch - I am waiting for replay, apparently some additional work there is needed. But I would like to solve it ASAP. Milan ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2010-10-24 17:05 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-09-11 6:06 [dm-crypt] [BUG] bad performance and system stalls when using dm-crypt dave b 2010-09-11 6:57 ` Heinz Diehl 2010-09-11 8:11 ` dave b 2010-09-11 9:02 ` dave b 2010-09-11 9:08 ` dave b 2010-09-12 8:36 ` dave b 2010-09-12 9:04 ` Heinz Diehl 2010-09-12 9:07 ` Milan Broz 2010-09-12 10:42 ` dave b 2010-09-12 10:45 ` dave b 2010-10-06 17:58 ` dave b 2010-10-24 15:32 ` Fwd: " dave b 2010-10-24 16:31 ` Milan Broz 2010-10-24 16:52 ` dave b 2010-10-24 17:05 ` Milan Broz
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.