linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Kemeng Shi <shikemeng@huaweicloud.com>
To: Jan Kara <jack@suse.cz>, Brian Foster <bfoster@redhat.com>
Cc: akpm@linux-foundation.org, willy@infradead.org, tj@kernel.org,
	dsterba@suse.com, mjguzik@gmail.com, dhowells@redhat.com,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH v2 3/6] writeback: support retrieving per group debug writeback stats of bdi
Date: Sun, 7 Apr 2024 11:13:41 +0800	[thread overview]
Message-ID: <6bf2280d-bce1-c1c5-3b25-8cfc7e1fa81d@huaweicloud.com> (raw)
In-Reply-To: <20240404090753.q3iugmqeeqig64db@quack3>



on 4/4/2024 5:07 PM, Jan Kara wrote:
> On Wed 03-04-24 11:04:58, Brian Foster wrote:
>> On Wed, Apr 03, 2024 at 04:49:42PM +0800, Kemeng Shi wrote:
>>> on 3/29/2024 9:10 PM, Brian Foster wrote:
>>>> On Wed, Mar 27, 2024 at 11:57:48PM +0800, Kemeng Shi wrote:
>>>>> +		collect_wb_stats(&stats, wb);
>>>>> +
>>>>
>>>> Also, similar question as before on whether you'd want to check
>>>> WB_registered or something here..
>>> Still prefer to keep full debug info and user could filter out on
>>> demand.
>>
>> Ok. I was more wondering if that was needed for correctness. If not,
>> then that seems fair enough to me.
>>
>>>>> +		if (mem_cgroup_wb_domain(wb) == NULL) {
>>>>> +			wb_stats_show(m, wb, &stats);
>>>>> +			continue;
>>>>> +		}
>>>>
>>>> Can you explain what this logic is about? Is the cgwb_calc_thresh()
>>>> thing not needed in this case? A comment might help for those less
>>>> familiar with the implementation details.
>>> If mem_cgroup_wb_domain(wb) is NULL, then it's bdi->wb, otherwise,
>>> it's wb in cgroup. For bdi->wb, there is no need to do wb_tryget
>>> and cgwb_calc_thresh. Will add some comment in next version.
>>>>
>>>> BTW, I'm also wondering if something like the following is correct
>>>> and/or roughly equivalent:
>>>> 	
>>>> 	list_for_each_*(wb, ...) {
>>>> 		struct wb_stats stats = ...;
>>>>
>>>> 		if (!wb_tryget(wb))
>>>> 			continue;
>>>>
>>>> 		collect_wb_stats(&stats, wb);
>>>>
>>>> 		/*
>>>> 		 * Extra wb_thresh magic. Drop rcu lock because ... . We
>>>> 		 * can do so here because we have a ref.
>>>> 		 */
>>>> 		if (mem_cgroup_wb_domain(wb)) {
>>>> 			rcu_read_unlock();
>>>> 			stats.wb_thresh = min(stats.wb_thresh, cgwb_calc_thresh(wb));
>>>> 			rcu_read_lock();
>>>> 		}
>>>>
>>>> 		wb_stats_show(m, wb, &stats)
>>>> 		wb_put(wb);
>>>> 	}
>>> It's correct as wb_tryget to bdi->wb has no harm. I have considered
>>> to do it in this way, I change my mind to do it in new way for
>>> two reason:
>>> 1. Put code handling wb in cgroup more tight which could be easier
>>> to maintain.
>>> 2. Rmove extra wb_tryget/wb_put for wb in bdi.
>>> Would this make sense to you?
>>
>> Ok, well assuming it is correct the above logic is a bit more simple and
>> readable to me. I think you'd just need to fill in the comment around
>> the wb_thresh thing rather than i.e. having to explain we don't need to
>> ref bdi->wb even though it doesn't seem to matter.
>>
>> I kind of feel the same on the wb_stats file thing below just because it
>> seems more consistent and available if wb_stats eventually grows more
>> wb-specific data.
>>
>> That said, this is subjective and not hugely important so I don't insist
>> on either point. Maybe wait a bit and see if Jan or Tejun or somebody
>> has any thoughts..? If nobody else expresses explicit preference then
>> I'm good with it either way.
> 
> No strong opinion from me really.
> 
>>>>> +static void cgwb_debug_register(struct backing_dev_info *bdi)
>>>>> +{
>>>>> +	debugfs_create_file("wb_stats", 0444, bdi->debug_dir, bdi,
>>>>> +			    &cgwb_debug_stats_fops);
>>>>> +}
>>>>> +
>>>>>  static void bdi_collect_stats(struct backing_dev_info *bdi,
>>>>>  			      struct wb_stats *stats)
>>>>>  {
>>>>> @@ -117,6 +202,8 @@ static void bdi_collect_stats(struct backing_dev_info *bdi,
>>>>>  {
>>>>>  	collect_wb_stats(stats, &bdi->wb);
>>>>>  }
>>>>> +
>>>>> +static inline void cgwb_debug_register(struct backing_dev_info *bdi) { }
>>>>
>>>> Could we just create the wb_stats file regardless of whether cgwb is
>>>> enabled? Obviously theres only one wb in the !CGWB case and it's
>>>> somewhat duplicative with the bdi stats file, but that seems harmless if
>>>> the same code can be reused..? Maybe there's also a small argument for
>>>> dropping the state info from the bdi stats file and moving it to
>>>> wb_stats.In backing-dev.c, there are a lot "#ifdef CGWB .. #else .. #endif" to
>>> avoid unneed extra cost when CGWB is not enabled.
>>> I think it's better to avoid extra cost from wb_stats when CGWB is not
>>> enabled. For now, we only save cpu cost to create and destroy wb_stats
>>> and save memory cost to record debugfs file, we could save more in
>>> future when wb_stats records more debug info.
> 
> Well, there's the other side that you don't have to think whether the
> kernel has CGWB enabled or not when asking a customer to gather the
> writeback debug info - you can always ask for wb_stats. Also if you move
> the wb->state to wb_stats only it will become inaccessible with CGWB
> disabled. So I agree with Brian that it is better to provide wb_stats also
> with CGWB disabled (and we can just implement wb_stats for !CGWB case with
> the same function as bdi_stats).
> 
> That being said all production kernels I have seen do have CGWB enabled so
> I don't care that much about this...
It's acceptable to me if the extra cost is tolerable.
> 
>>> Move state info from bdi stats to wb_stats make senses to me. The only
>>> concern would be compatibility problem. I will add a new patch to this
>>> to make this more noticeable and easier to revert.
> 
> Yeah, I don't think we care much about debugfs compatibility but I think
> removing state from bdi_stats is not worth the inconsistency between
> wb_stats and bdi_stats in the !CGWB case.
OK, I will simply keep wb_stats even CGWB is not enabled while keep state
in both bdi_stats and wb_stats if Braian doesn't against in recent dasy.

Kemeng
> 
> 								Honza
> 


  reply	other threads:[~2024-04-07  3:13 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-27 15:57 [PATCH v2 0/6] Improve visibility of writeback Kemeng Shi
2024-03-27 15:57 ` [PATCH v2 1/6] writeback: protect race between bdi release and bdi_debug_stats_show Kemeng Shi
2024-03-28 17:53   ` Brian Foster
2024-04-03  2:16     ` Kemeng Shi
2024-03-27 15:57 ` [PATCH v2 2/6] writeback: collect stats of all wb of bdi in bdi_debug_stats_show Kemeng Shi
2024-03-29 13:04   ` Brian Foster
2024-04-03  7:49     ` Kemeng Shi
2024-03-27 15:57 ` [PATCH v2 3/6] writeback: support retrieving per group debug writeback stats of bdi Kemeng Shi
2024-03-29 13:10   ` Brian Foster
2024-04-03  8:49     ` Kemeng Shi
2024-04-03 15:04       ` Brian Foster
2024-04-04  9:07         ` Jan Kara
2024-04-07  3:13           ` Kemeng Shi [this message]
2024-04-07  2:48         ` Kemeng Shi
2024-03-27 15:57 ` [PATCH v2 4/6] writeback: add wb_monitor.py script to monitor writeback info on bdi Kemeng Shi
2024-03-27 15:57 ` [PATCH v2 5/6] writeback: rename nr_reclaimable to nr_dirty in balance_dirty_pages Kemeng Shi
2024-03-27 15:57 ` [PATCH v2 6/6] writeback: define GDTC_INIT_NO_WB to null Kemeng Shi
2024-03-27 17:40 ` [PATCH v2 0/6] Improve visibility of writeback Andrew Morton
2024-03-28  1:59   ` Kemeng Shi
2024-03-28  8:23     ` Kemeng Shi
2024-03-28 19:15   ` Kent Overstreet
2024-03-28 19:23     ` Andrew Morton
2024-03-28 19:36       ` Kent Overstreet
2024-03-28 19:24 ` Kent Overstreet
2024-03-28 19:31   ` Tejun Heo
2024-03-28 19:40     ` Kent Overstreet
2024-03-28 19:46       ` Tejun Heo
2024-03-28 19:55         ` Kent Overstreet
2024-03-28 20:13           ` Tejun Heo
2024-03-28 20:22             ` Kent Overstreet
2024-03-28 20:46               ` Tejun Heo
2024-03-28 20:53                 ` Kent Overstreet
2024-04-03 16:27                 ` Jan Kara
2024-04-03 18:44                   ` Tejun Heo
2024-04-03 19:06                     ` Kent Overstreet
2024-04-03 19:21                       ` Tejun Heo
2024-04-03 22:24                         ` Kent Overstreet
2024-04-03  6:56   ` Kemeng Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6bf2280d-bce1-c1c5-3b25-8cfc7e1fa81d@huaweicloud.com \
    --to=shikemeng@huaweicloud.com \
    --cc=akpm@linux-foundation.org \
    --cc=bfoster@redhat.com \
    --cc=dhowells@redhat.com \
    --cc=dsterba@suse.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mjguzik@gmail.com \
    --cc=tj@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).