public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed
From: Steffen Maier <maier@linux.ibm.com>
To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linux-kernel@vger.kernel.org, linux-block@vger.kernel.org,
	Steven Rostedt <rostedt@goodmis.org>,
	Ingo Molnar <mingo@redhat.com>, Jens Axboe <axboe@kernel.dk>,
	Li Zefan <lizf@cn.fujitsu.com>, Li Zefan <lizefan@huawei.com>
Subject: Re: [PATCH 2/2] tracing/events: block: dev_t via driver core for plug and unplug events
Date: Mon, 16 Apr 2018 18:33:27 +0200	[thread overview]
Message-ID: <59186bf6-abf1-87b0-914d-eed1b40ef4a8@linux.ibm.com> (raw)
In-Reply-To: <20180415083154.GA12254@kroah.com>

Hi Greg,

On 04/15/2018 10:31 AM, Greg Kroah-Hartman wrote:
> On Fri, Apr 13, 2018 at 03:07:18PM +0200, Steffen Maier wrote:
>> Complements v2.6.31 commit 55782138e47d ("tracing/events: convert bloc=
k
>> trace points to TRACE_EVENT()") to be equivalent to traditional blktra=
ce
>> output. Also this allows event filtering to not always get all (un)plu=
g
>> events.
>>
>> NB: The NULL pointer check for q->kobj.parent is certainly racy and
>> I don't have enough experience if it's good enough for a trace event.
>> The change did work for my cases (block device read/write I/O on
>> zfcp-attached SCSI disks and dm-mpath on top).
>>
>> While I haven't seen any prior art using driver core (parent) relation=
s
>> for trace events, there are other cases using this when no direct poin=
ter
>> exists between objects, such as:
>>   #define to_scsi_target(d)	container_of(d, struct scsi_target, dev)
>>   static inline struct scsi_target *scsi_target(struct scsi_device *sd=
ev)
>>   {
>> 	return to_scsi_target(sdev->sdev_gendev.parent);
>>   }
>=20
> That is because you "know" the parent of a target device is a
> scsi_target.

true

>> This is the object model we make use of here:
>>
>> struct gendisk {
>>          struct hd_struct {
>>                  struct device {      /*container_of*/
>>                          struct kobject kobj; <--+
>>                          dev_t  devt; /*deref*/  |
>>                  } __dev;                        |
>>          } part0;                                |
>>          struct request_queue *queue; ..+        |
>> }                                      :        |
>>                                         :        |
>> struct request_queue {  <..............+        |
>>          /* queue kobject */                     |
>>          struct kobject {                        |
>>                  struct kobject *parent; --------+
>=20
> Are you sure about this?

I double checked it with crash on a running system chasing pointers and=20
looking at structure debug symbols.
But of course I cannot guarantee it's always been like this and will be.

>>          } kobj;
>> }
>>

>> The difference to blktrace parsed output is that block events don't us=
e the
>> partition's minor number but the containing block device's minor numbe=
r:
>=20
> Why do you want the block device's minor number here?  What is wrong
> with the partition's minor number?  I would think you want that instead=
=2E

No change introduced with my patch. I just describe state of the art=20
since the mentioned=20
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit=
/?id=3D55782138e47d.
It (or even its predecessor) used request_queue as trace function=20
argument (plus mostly either request or bio). So that's the currently=20
available context for these events. My change is consistent with that.
But then again, it's not much of a problem as we do have the remap event =

which shows the mapping from partition to blockdev.

blktrace, hooking with callbacks on the block trace events, has its own=20
context information [struct blk_trace] and can get to e.g. the dev_t=20
with its own real pointers without using driver core relations. But I=20
had the impression that's only wired if one uses the blktrace IOCTL or=20
the blk tracer [do_blk_trace_setup()], not for "pure" block events.
> static void blk_add_trace_plug(void *ignore, struct request_queue *q)
> {
> 	struct blk_trace *bt =3D q->blk_trace;
                                ^^^^^^^^^^^^
>=20
> 	if (bt)
> 		__blk_add_trace(bt, 0, 0, 0, 0, BLK_TA_PLUG, 0, 0, NULL, NULL);
> }
>=20
> static void blk_add_trace_unplug(void *ignore, struct request_queue *q,=

> 				    unsigned int depth, bool explicit)
> {
> 	struct blk_trace *bt =3D q->blk_trace;
                                ^^^^^^^^^^^^
>=20
> 	if (bt) {
> 		__be64 rpdu =3D cpu_to_be64(depth);
> 		u32 what;
>=20
> 		if (explicit)
> 			what =3D BLK_TA_UNPLUG_IO;
> 		else
> 			what =3D BLK_TA_UNPLUG_TIMER;
>=20
> 		__blk_add_trace(bt, 0, 0, 0, 0, what, 0, sizeof(rpdu), &rpdu, NULL);
> 	}
> }

> struct blk_trace {
> 	int trace_state;
> 	struct rchan *rchan;
> 	unsigned long __percpu *sequence;
> 	unsigned char __percpu *msg_data;
> 	u16 act_mask;
> 	u64 start_lba;
> 	u64 end_lba;
> 	u32 pid;
> 	u32 dev;
             ^^^
> 	struct dentry *dir;
> 	struct dentry *dropped_file;
> 	struct dentry *msg_file;
> 	struct list_head running_list;
> 	atomic_t dropped;
> };



>> $ dd if=3D/dev/sdf1 count=3D1
>>
>> $ cat /sys/kernel/debug/tracing/trace
>> block_bio_remap: 8,80 R 2048 + 32 <- (8,81) 0
>> block_bio_queue: 8,80 R 2048 + 32 [dd]
>> block_getrq: 8,80 R 2048 + 32 [dd]
>> block_plug: 8,80 [dd]
>>              ^^^^
>> block_rq_insert: 8,80 R 16384 () 2048 + 32 [dd]
>> block_unplug: 8,80 [dd] 1 explicit
>>                ^^^^
>> block_rq_issue: 8,80 R 16384 () 2048 + 32 [dd]
>> block_rq_complete: 8,80 R () 2048 + 32 [0]

>> diff --git a/include/trace/events/block.h b/include/trace/events/block=
=2Eh
>> index a13613d27cee..cffedc26e8a3 100644
>> --- a/include/trace/events/block.h
>> +++ b/include/trace/events/block.h
>> @@ -460,14 +460,18 @@ TRACE_EVENT(block_plug,
>>   	TP_ARGS(q),
>>  =20
>>   	TP_STRUCT__entry(
>> +		__field( dev_t,		dev			)
>>   		__array( char,		comm,	TASK_COMM_LEN	)
>>   	),
>>  =20
>>   	TP_fast_assign(
>> +		__entry->dev =3D q->kobj.parent ?
>> +		container_of(q->kobj.parent, struct device, kobj)->devt : 0;
>=20
> That really really really scares me.  It feels very fragile and messing=

> with parent pointers is ripe for things breaking in the future in odd
> and unexplainable ways.
>=20
> And how can the parent be NULL?

I'm hoping for help by block layer experts.

I suppose block device unplug/removal could be a case. But I don't know=20
the details how this works and if object release is protected while I/O=20
is pending and new I/O is rejected beforehand. That might make my=20
approach safe as it would not call the trace functions while the parent=20
pointer changes.

> I don't know the block layer but this feels very wrong to me.  Are you
> sure there isn't some other way to get this info?

No, I'm not sure at all. But I'm no block layer expert either. This is=20
just an idea I had which did work for my cases and I'm looking for=20
confirmation or denial by the experts. So if it's not safe from a block=20
layer point of view either, then I have to ditch it.

--=20
Mit freundlichen Gr=C3=BC=C3=9Fen / Kind regards
Steffen Maier

Linux on z Systems Development

IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz
Geschaeftsfuehrung: Dirk Wittkopp
Sitz der Gesellschaft: Boeblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294

  reply	other threads:[~2018-04-16 16:33 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-13 13:07 [PATCH 0/2] tracing/events: block: bring more on a par with blktrace Steffen Maier
2018-04-13 13:07 ` [PATCH 1/2] tracing/events: block: track and print if unplug was explicit or schedule Steffen Maier
2018-04-13 14:16   ` Steven Rostedt
2018-04-13 13:07 ` [PATCH 2/2] tracing/events: block: dev_t via driver core for plug and unplug events Steffen Maier
2018-04-15  8:31   ` Greg Kroah-Hartman
2018-04-16 16:33     ` Steffen Maier [this message]
2018-04-19 19:24       ` Omar Sandoval
2018-04-19 20:56         ` Bart Van Assche
2018-04-24 14:49           ` Steffen Maier
2018-04-27 16:38             ` Bart Van Assche
2018-04-13 13:07 ` [RFC] tracing/events: block: also try to get dev_t via driver core for some events Steffen Maier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=59186bf6-abf1-87b0-914d-eed1b40ef4a8@linux.ibm.com \
    --to=maier@linux.ibm.com \
    --cc=axboe@kernel.dk \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizefan@huawei.com \
    --cc=lizf@cn.fujitsu.com \
    --cc=mingo@redhat.com \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox