* fsync stuck at jbd2_log_wait_commit on NVMe devices @ 2015-08-12 16:57 Roy Yang 2015-08-12 18:37 ` Theodore Ts'o 0 siblings, 1 reply; 7+ messages in thread From: Roy Yang @ 2015-08-12 16:57 UTC (permalink / raw) To: linux-ext4@vger.kernel.org Hi EXT4 Experts, We came cross an ext4 journaling problem, and I did some googling on high I/O wait, found a lot of links, but none of them is the same. The backtrace when system is hung is : Aug 7 11:31:33 2f4526ab kernel: "echo 0> /proc/sys/kernel/hung_task_timeout_secs" disables this message. ... Aug 7 11:31:33 2f4526ab kernel: Call Trace: Aug 7 11:31:33 2f4526ab kernel: [<ffffffff816096a9>] schedule+0x29/0x70 Aug 7 11:31:33 2f4526ab kernel: [<ffffffffa031d785>] jbd2_log_wait_commit+0xc5/0x140 [jbd2] Aug 7 11:31:33 2f4526ab kernel: [<ffffffff81098230>] ? wake_up_bit+0x30/0x30 Aug 7 11:31:33 2f4526ab kernel: [<ffffffffa031fe32>] jbd2_complete_transaction+0x52/0xa0 [jbd2] Aug 7 11:31:33 2f4526ab kernel: [<ffffffffa03339f2>] ext4_sync_file+0x292/0x320 [ext4] Aug 7 11:31:33 2f4526ab kernel: [<ffffffff811f6ba5>] do_fsync+0x65/0xa0 Aug 7 11:31:33 2f4526ab kernel: [<ffffffff811f6e70>] SyS_fsync+0x10/0x20 Aug 7 11:31:33 2f4526ab kernel: [<ffffffff81614209>] system_call_fastpath+0x16/0x1b At application layer, we use a lot of fsync APIs. When I/O load is heavy, we saw a lot of I/Os can't finish within 120 seconds because JBD2 can't finish the journaling. We also see when this happens, the JBD2 info shows very high on the following two items: $ cat /proc/fs/jbd2/nvme0n1/info 85162438 transactions (85162436 requested), each up to 8192 blocks average: 0ms waiting for transaction 0ms request delay 1ms running transaction <==== this will be very high, 3 to 6 seconds 0ms transaction was being locked 0ms flushing data (in ordered mode) 0ms logging transaction 153us average transaction commit time <=== this will be very high, half to one second 35 handles per transaction 2 blocks per transaction 3 logged blocks per transaction Our environment : $ mount /dev/nvme0n1 on /logs type ext4 (rw,noatime,data=ordered) $ uname -a Linux a69ac961 3.10.0-229.7.2.el7.x86_64 #1 SMP Tue Jun 23 22:06:11 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux $ cat /etc/redhat-release CentOS Linux release 7.1.1503 (Core) $ sudo tune2fs -l /dev/nvme0n1 tune2fs 1.42.9 (28-Dec-2013) Filesystem volume name: cty-data Last mounted on: <not available> Filesystem UUID: 227fde0d-a3ae-4236-9d32-46ef0091b3b3 Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent 64bit flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize Filesystem flags: signed_directory_hash Default mount options: user_xattr acl Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 91062272 Block count: 364226560 Reserved block count: 3642265 Free blocks: 126192987 Free inodes: 90916973 First block: 0 Block size: 4096 Fragment size: 4096 Group descriptor size: 64 Reserved GDT blocks: 1024 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 8192 Inode blocks per group: 512 Flex block group size: 16 Filesystem created: Wed Aug 5 13:00:41 2015 Last mount time: Mon Aug 10 18:32:59 2015 Last write time: Mon Aug 10 18:32:59 2015 Mount count: 6 Maximum mount count: -1 Last checked: Wed Aug 5 13:00:41 2015 Check interval: 0 (<none>) Lifetime writes: 135 MB Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 256 Required extra isize: 28 Desired extra isize: 28 Journal inode: 8 Default directory hash: half_md4 Directory Hash Seed: c8f8aa7c-583b-4251 Any suggestions or hints how to debug this problem will be appreciated! Thank you very much, Roy © 2015 Microsoft Terms Privacy & cookies Developers English (United States) -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: fsync stuck at jbd2_log_wait_commit on NVMe devices 2015-08-12 16:57 fsync stuck at jbd2_log_wait_commit on NVMe devices Roy Yang @ 2015-08-12 18:37 ` Theodore Ts'o 2015-08-12 19:28 ` Roy Yang 0 siblings, 1 reply; 7+ messages in thread From: Theodore Ts'o @ 2015-08-12 18:37 UTC (permalink / raw) To: Roy Yang; +Cc: linux-ext4@vger.kernel.org Hi Roy, My suggestion is to collect information from the jbd2_run_stats and jbd2_checkpoint_stats tracepoints. cd /sys/kernel/debug/tracing echo 1 > events/jbd2/jbd2_run_stats/enable echo 1 > events/jbd2/jbd2_checkpoint_stats/enable cat trace_pipe > /tmp/traces & tail -f /tmp/traces The jbd2_handle_stats tracepoint can be informative, but it's also far more voluminous. That will give us a hint where things are getting bottlenecked. What sort of workload is your application doing? Is it just primarily doing random writes into a preallocated file? Is it creating or deleting files? Extending or truncating files? etc. - Ted ^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: fsync stuck at jbd2_log_wait_commit on NVMe devices 2015-08-12 18:37 ` Theodore Ts'o @ 2015-08-12 19:28 ` Roy Yang 2015-08-12 21:43 ` Roy Yang 0 siblings, 1 reply; 7+ messages in thread From: Roy Yang @ 2015-08-12 19:28 UTC (permalink / raw) To: Theodore Ts'o; +Cc: linux-ext4@vger.kernel.org Hi Ted, I will collect some info about jbd2 and get back to this list. To answer your questions, our workload is mostly sequential writes from a couple of daemons. Every second, we will create 6 to 10 files, and write around 8M bytes into them. We only do extend or truncate when needed, but I am sure we rarely do this on NVMe devices. Reading the kernel document, I also see ext4 ordered option will combine metadata and data blocks for journaling. If the data block is big, and the workload is heavy, will JBD2 be soaked? Thanks, Roy ---------------------------------------- > Date: Wed, 12 Aug 2015 14:37:50 -0400 > From: tytso@mit.edu > To: bsdnet@outlook.com > CC: linux-ext4@vger.kernel.org > Subject: Re: fsync stuck at jbd2_log_wait_commit on NVMe devices > > Hi Roy, > > My suggestion is to collect information from the jbd2_run_stats and > jbd2_checkpoint_stats tracepoints. > > cd /sys/kernel/debug/tracing > echo 1> events/jbd2/jbd2_run_stats/enable > echo 1> events/jbd2/jbd2_checkpoint_stats/enable > cat trace_pipe> /tmp/traces & > tail -f /tmp/traces > > The jbd2_handle_stats tracepoint can be informative, but it's also far > more voluminous. > > That will give us a hint where things are getting bottlenecked. > > What sort of workload is your application doing? Is it just primarily > doing random writes into a preallocated file? Is it creating or > deleting files? Extending or truncating files? etc. > > - Ted > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: fsync stuck at jbd2_log_wait_commit on NVMe devices 2015-08-12 19:28 ` Roy Yang @ 2015-08-12 21:43 ` Roy Yang 2015-08-12 22:34 ` Theodore Ts'o 0 siblings, 1 reply; 7+ messages in thread From: Roy Yang @ 2015-08-12 21:43 UTC (permalink / raw) To: Theodore Ts'o; +Cc: linux-ext4@vger.kernel.org Hi Ted, I got some JBD2 tracing log by running our test program, most of them will take 10 seconds or so. Compared to the normal system, this test only takes 2 seconds. We saw the spike of running, but checking with kernel source. I don't get what it means. Any insights? <...>-220601 [015] .... 181555.139321: jbd2_run_stats: dev 259,3 tid 1064541 wait 0 request_delay 0 running 0 locked 0 flushing 0 logging 0 handle_count 4 blocks 1 blocks_logged 2 <...>-220601 [015] .... 181555.141161: jbd2_run_stats: dev 259,3 tid 1064542 wait 0 request_delay 0 running 0 locked 0 flushing 0 logging 0 handle_count 4 blocks 1 blocks_logged 2 <...>-220601 [015] .... 181555.143399: jbd2_run_stats: dev 259,3 tid 1064543 wait 0 request_delay 0 running 1 locked 0 flushing 0 logging 0 handle_count 4 blocks 3 blocks_logged 4 <...>-220601 [015] .... 181555.143710: jbd2_run_stats: dev 259,3 tid 1064544 wait 0 request_delay 0 running 0 locked 0 flushing 0 logging 0 handle_count 4 blocks 1 blocks_logged 2 <...>-220601 [015] .... 181555.143840: jbd2_run_stats: dev 259,3 tid 1064545 wait 0 request_delay 0 running 0 locked 0 flushing 0 logging 0 handle_count 1 blocks 4 blocks_logged 5 <...>-220601 [015] .... 181555.176298: jbd2_run_stats: dev 259,3 tid 1064546 wait 0 request_delay 0 running 32 locked 0 flushing 0 logging 0 handle_count 31 blocks 5 blocks_logged 6 jbd2/nvme0n1p1--964 [014] .... 181555.189997: jbd2_run_stats: dev 259,1 tid 293294 wait 0 request_delay 0 running 5000 locked 0 flushing 0 logging 0 handle_count 15613 blocks 4 blocks_logged 5 <...>-220601 [015] .... 181556.126407: jbd2_run_stats: dev 259,3 tid 1064547 wait 0 request_delay 0 running 929 locked 0 flushing 0 logging 0 handle_count 30 blocks 3 blocks_logged 4 <...>-220601 [014] .... 181556.159874: jbd2_run_stats: dev 259,3 tid 1064548 wait 0 request_delay 0 running 10 locked 0 flushing 0 logging 0 handle_count 29 blocks 1 blocks_logged 2 <...>-220601 [000] .... 181557.123872: jbd2_run_stats: dev 259,3 tid 1064549 wait 0 request_delay 0 running 941 locked 0 flushing 0 logging 0 handle_count 30 blocks 3 blocks_logged 4 <...>-220601 [015] .... 181558.135168: jbd2_run_stats: dev 259,3 tid 1064550 wait 0 request_delay 1 running 1 locked 0 flushing 0 logging 0 handle_count 8 blocks 6 blocks_logged 7 <...>-220601 [015] .... 181558.138895: jbd2_run_stats: dev 259,3 tid 1064551 wait 0 request_delay 0 running 0 locked 0 flushing 0 logging 0 handle_count 1 blocks 4 blocks_logged 5 <...>-220601 [015] .... 181558.139091: jbd2_run_stats: dev 259,3 tid 1064552 wait 0 request_delay 0 running 0 locked 0 flushing 0 logging 1 handle_count 2 blocks 4 blocks_logged 5 <...>-220601 [015] .... 181558.139886: jbd2_run_stats: dev 259,3 tid 1064553 wait 0 request_delay 0 running 0 locked 0 flushing 0 logging 0 handle_count 12 blocks 6 blocks_logged 7 <...>-220601 [015] .... 181558.147189: jbd2_run_stats: dev 259,3 tid 1064554 wait 0 request_delay 0 running 0 locked 0 flushing 0 logging 0 handle_count 1 blocks 4 blocks_logged 5 Thanks, Roy ---------------------------------------- > From: bsdnet@outlook.com > To: tytso@mit.edu > CC: linux-ext4@vger.kernel.org > Subject: RE: fsync stuck at jbd2_log_wait_commit on NVMe devices > Date: Wed, 12 Aug 2015 12:28:10 -0700 > > Hi Ted, > > I will collect some info about jbd2 and get back to this list. > > To answer your questions, our workload is mostly sequential writes from a couple of daemons. > Every second, we will create 6 to 10 files, and write around 8M bytes into them. We only > do extend or truncate when needed, but I am sure we rarely do this on NVMe devices. > > Reading the kernel document, I also see ext4 ordered option will combine metadata and data blocks > for journaling. If the data block is big, and the workload is heavy, will JBD2 be soaked? > > Thanks, > > Roy > > ---------------------------------------- >> Date: Wed, 12 Aug 2015 14:37:50 -0400 >> From: tytso@mit.edu >> To: bsdnet@outlook.com >> CC: linux-ext4@vger.kernel.org >> Subject: Re: fsync stuck at jbd2_log_wait_commit on NVMe devices >> >> Hi Roy, >> >> My suggestion is to collect information from the jbd2_run_stats and >> jbd2_checkpoint_stats tracepoints. >> >> cd /sys/kernel/debug/tracing >> echo 1> events/jbd2/jbd2_run_stats/enable >> echo 1> events/jbd2/jbd2_checkpoint_stats/enable >> cat trace_pipe> /tmp/traces & >> tail -f /tmp/traces >> >> The jbd2_handle_stats tracepoint can be informative, but it's also far >> more voluminous. >> >> That will give us a hint where things are getting bottlenecked. >> >> What sort of workload is your application doing? Is it just primarily >> doing random writes into a preallocated file? Is it creating or >> deleting files? Extending or truncating files? etc. >> >> - Ted >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: fsync stuck at jbd2_log_wait_commit on NVMe devices 2015-08-12 21:43 ` Roy Yang @ 2015-08-12 22:34 ` Theodore Ts'o 2015-08-12 23:34 ` Roy Yang 0 siblings, 1 reply; 7+ messages in thread From: Theodore Ts'o @ 2015-08-12 22:34 UTC (permalink / raw) To: Roy Yang; +Cc: linux-ext4@vger.kernel.org On Wed, Aug 12, 2015 at 02:43:22PM -0700, Roy Yang wrote: > Hi Ted, > > I got some JBD2 tracing log by running our test program, most of them will take 10 seconds or so. Compared to the normal system, this test only takes 2 seconds. > We saw the spike of running, but checking with kernel source. I don't get what it means. Any insights? You need to get the trace on your normal system --- a system when under load, you are seeing the long times for a journal commit to complete. The jbd2_run_stats gives us critical insight into what might be going wrong --- but just as it doesn't help much to use a diagnostic tool on a pet dog when you are trying to determine why the child is running a fever, it doesn't help much to take traces on a test program/system when we're trying to actually debug why your production server is having trouble.... > <...>-220601 [015] .... 181555.139321: jbd2_run_stats: dev 259,3 tid 1064541 wait 0 request_delay 0 running 0 locked 0 flushing 0 logging 0 handle_count 4 blocks 1 blocks_logged 2 There will be one of these trace points emitted for each jd2 transaction commit. Most of the fields, with the exception of the last three (handle_count, blocks, and blocks logged) are in units of 1000 HZ "jiffies. So for example, if you don't have any fsync()'s, transactions by default will run for 5 seconds before being committed, so the "running" time will be a bit larger than 5000 jiffies (the timer goes off at 5 seconds, but there may be some delay before the jbd2 kernel thread gets scheduled): jbd2/sda3-8-275 [003] ...2 70449.637938: jbd2_run_stats: dev 8,3 tid 789663 wait 0 request_delay 0 running 5030 locked 0 flushing 0 logging 0 handle_count 18 blocks 5 blocks_logged 6 The "wait" field indicates the maximum time that a handle associated with a transaction had to wait before it could get started. Typically this is because we couldn't start a new transactions because the old transaction was still in the process of being committed. The "request_delay" field indicates delay between when transaction commit was requested (typically due to an fsync), and when the jbd2 kernel thread actually started working on the commit. (Again, all time fields are in units of jiffies, so you can read the integer in milliseconds --> 5030 jiffies === 5030 milliseconds == 5.03 seconds. The "running" field indicates how long transaction was running; as mentioned above, this will typically be around 5000 jiffies unless your workload was doing fsync()'s, or if the transaction was big enough that it was forced to close because journal space reasons. The "locked" field indicates how long it takes for all of the pending handles to be completed. When the jbd2 layer starts working on the commit, the first thing it does is lock down the transaction so no new handles can be started. It then waits for all curently started handles can complete. While we are waiting for handles to close, no other handles can be started (see discussion of the maximum wait time above). We try to keep handle processing time as short as possible (for example, by prefetching blocks before starting a handle), but there are cases where this is not possible, or at least not easy. Analysis of the jbd2_handle_stats trances can help identify longer-running handles. If these long-running handles happen right before journal commit (and for a workload that is overly fsync-happy, that may be most handles), they can prolong how long a journal commit stays in the "locked" phase. The "flushing" field indicates how long the transaction commit spent writing out data blocks. We only need to write out data blocks in data=ordered mode associated with inodes which have new blocks allocated during the last transaction. The "logging" field indicates how long it took to write out the blocks that are actually written to the journal. The number of blocks actually written are reflected in the "blocks" and "blocks_logged" fields, where the "blocks_logged" fields includes the overhead blocks (jbd2 descriptor blocks, jbd2 revoke blocks, etc.) The "handle_count" field indicates how many handles are associated with a particular transaction. In general each simple system call (chmod, creat, chown, symlink, read, etc.) that modifies the file system requires at most a single handle. The big exception to this rule of thumb is the truncate(2) and unlink(2) system calls, which may require multiple handles in order to complete. The truncate() and unlink system calls are also the ones which tend to hold a handle open for longer-than-average amounts of time. By the way, this is why calling truncate() and doing block allocations and doing fsync() a lot is *not* free. If you are purchasing $$$ NVME storage, in general it's a wise idea for your application to simply do non-allocating writes (i.e., overwriting previous blocks in a preallocated file), and tracking which blocks are used for what in your application, instead of trying to ask general purpose file system code to do this for you. If you can used a specialized data structure (say, a circular buffer) and manage the head and tail pointers yourself, it will almost always require much less overhead than what the file system will have to do since the file system doesn't have the advantage of specialized knowledge of exactly what your requirements might be for your workload. Regards, - Ted ^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: fsync stuck at jbd2_log_wait_commit on NVMe devices 2015-08-12 22:34 ` Theodore Ts'o @ 2015-08-12 23:34 ` Roy Yang 2015-08-13 13:09 ` Theodore Ts'o 0 siblings, 1 reply; 7+ messages in thread From: Roy Yang @ 2015-08-12 23:34 UTC (permalink / raw) To: Theodore Ts'o; +Cc: linux-ext4@vger.kernel.org Hi Ted, Appreciate your time, and will get the log from normal work load. BTW, the previous JBD2 log is from the node which is in bad state. Normal node finish within 3 seconds; while on this node, it takes 10 seconds. Thank you very much! Rpu ---------------------------------------- > Date: Wed, 12 Aug 2015 18:34:56 -0400 > From: tytso@mit.edu > To: bsdnet@outlook.com > CC: linux-ext4@vger.kernel.org > Subject: Re: fsync stuck at jbd2_log_wait_commit on NVMe devices > > On Wed, Aug 12, 2015 at 02:43:22PM -0700, Roy Yang wrote: >> Hi Ted, >> >> I got some JBD2 tracing log by running our test program, most of them will take 10 seconds or so. Compared to the normal system, this test only takes 2 seconds. >> We saw the spike of running, but checking with kernel source. I don't get what it means. Any insights? > > You need to get the trace on your normal system --- a system when > under load, you are seeing the long times for a journal commit to > complete. The jbd2_run_stats gives us critical insight into what > might be going wrong --- but just as it doesn't help much to use a > diagnostic tool on a pet dog when you are trying to determine why the > child is running a fever, it doesn't help much to take traces on a > test program/system when we're trying to actually debug why your > production server is having trouble.... > > >> <...>-220601 [015] .... 181555.139321: jbd2_run_stats: dev 259,3 tid 1064541 wait 0 request_delay 0 running 0 locked 0 flushing 0 logging 0 handle_count 4 blocks 1 blocks_logged 2 > > There will be one of these trace points emitted for each jd2 > transaction commit. Most of the fields, with the exception of the > last three (handle_count, blocks, and blocks logged) are in units of > 1000 HZ "jiffies. So for example, if you don't have any fsync()'s, > transactions by default will run for 5 seconds before being committed, > so the "running" time will be a bit larger than 5000 jiffies (the > timer goes off at 5 seconds, but there may be some delay before the > jbd2 kernel thread gets scheduled): > > jbd2/sda3-8-275 [003] ...2 70449.637938: jbd2_run_stats: dev 8,3 tid 789663 wait 0 request_delay 0 running 5030 locked 0 flushing 0 logging 0 handle_count 18 blocks 5 blocks_logged 6 > > The "wait" field indicates the maximum time that a handle associated > with a transaction had to wait before it could get started. Typically > this is because we couldn't start a new transactions because the old > transaction was still in the process of being committed. > > The "request_delay" field indicates delay between when transaction > commit was requested (typically due to an fsync), and when the jbd2 > kernel thread actually started working on the commit. (Again, all time > fields are in units of jiffies, so you can read the integer in > milliseconds --> 5030 jiffies === 5030 milliseconds == 5.03 seconds. > > The "running" field indicates how long transaction was running; as > mentioned above, this will typically be around 5000 jiffies unless > your workload was doing fsync()'s, or if the transaction was big > enough that it was forced to close because journal space reasons. > > The "locked" field indicates how long it takes for all of the pending > handles to be completed. When the jbd2 layer starts working on the > commit, the first thing it does is lock down the transaction so no new > handles can be started. It then waits for all curently started > handles can complete. While we are waiting for handles to close, no > other handles can be started (see discussion of the maximum wait time > above). We try to keep handle processing time as short as possible > (for example, by prefetching blocks before starting a handle), but > there are cases where this is not possible, or at least not easy. > Analysis of the jbd2_handle_stats trances can help identify > longer-running handles. If these long-running handles happen right > before journal commit (and for a workload that is overly fsync-happy, > that may be most handles), they can prolong how long a journal commit > stays in the "locked" phase. > > The "flushing" field indicates how long the transaction commit spent > writing out data blocks. We only need to write out data blocks in > data=ordered mode associated with inodes which have new blocks > allocated during the last transaction. > > The "logging" field indicates how long it took to write out the blocks > that are actually written to the journal. The number of blocks > actually written are reflected in the "blocks" and "blocks_logged" > fields, where the "blocks_logged" fields includes the overhead blocks > (jbd2 descriptor blocks, jbd2 revoke blocks, etc.) > > The "handle_count" field indicates how many handles are associated > with a particular transaction. In general each simple system call > (chmod, creat, chown, symlink, read, etc.) that modifies the file > system requires at most a single handle. The big exception to this > rule of thumb is the truncate(2) and unlink(2) system calls, which may > require multiple handles in order to complete. The truncate() and > unlink system calls are also the ones which tend to hold a handle open > for longer-than-average amounts of time. > > By the way, this is why calling truncate() and doing block allocations > and doing fsync() a lot is *not* free. If you are purchasing $$$ NVME > storage, in general it's a wise idea for your application to simply do > non-allocating writes (i.e., overwriting previous blocks in a > preallocated file), and tracking which blocks are used for what in > your application, instead of trying to ask general purpose file system > code to do this for you. If you can used a specialized data structure > (say, a circular buffer) and manage the head and tail pointers > yourself, it will almost always require much less overhead than what > the file system will have to do since the file system doesn't have the > advantage of specialized knowledge of exactly what your requirements > might be for your workload. > > Regards, > > - Ted > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: fsync stuck at jbd2_log_wait_commit on NVMe devices 2015-08-12 23:34 ` Roy Yang @ 2015-08-13 13:09 ` Theodore Ts'o 0 siblings, 0 replies; 7+ messages in thread From: Theodore Ts'o @ 2015-08-13 13:09 UTC (permalink / raw) To: Roy Yang; +Cc: linux-ext4@vger.kernel.org On Wed, Aug 12, 2015 at 04:34:17PM -0700, Roy Yang wrote: > Hi Ted, > > Appreciate your time, and will get the log from normal work load. > > BTW, the previous JBD2 log is from the node which is in bad state. > Normal node finish within 3 seconds; while on this node, it takes 10 seconds. Ah, you didn't say that. If this is a node in a bad state, and we're not seeing any real delays from the jbd2 logs, then it sounds very much like the problem is a hardware issue (or maybe an NVMe device driver issue). You might want to use blktrace to see how long the underlying I/O operations are taking to complete. - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2015-08-13 13:09 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-08-12 16:57 fsync stuck at jbd2_log_wait_commit on NVMe devices Roy Yang 2015-08-12 18:37 ` Theodore Ts'o 2015-08-12 19:28 ` Roy Yang 2015-08-12 21:43 ` Roy Yang 2015-08-12 22:34 ` Theodore Ts'o 2015-08-12 23:34 ` Roy Yang 2015-08-13 13:09 ` Theodore Ts'o
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).