// SPDX-License-Identifier: GPL-2.0 /* * Copyright IBM Corp. 2018 * Auxtrace support for s390 CPU-Measurement Sampling Facility * * Author(s): Thomas Richter <tmricht@linux.ibm.com> * * Auxiliary traces are collected during 'perf record' using rbd000 event. * Several PERF_RECORD_XXX are generated during recording: * * PERF_RECORD_AUX: * Records that new data landed in the AUX buffer part. * PERF_RECORD_AUXTRACE: * Defines auxtrace data. Followed by the actual data. The contents of * the auxtrace data is dependent on the event and the CPU. * This record is generated by perf record command. For details * see Documentation/perf.data-file-format.txt. * PERF_RECORD_AUXTRACE_INFO: * Defines a table of contains for PERF_RECORD_AUXTRACE records. This * record is generated during 'perf record' command. Each record contains * up to 256 entries describing offset and size of the AUXTRACE data in the * perf.data file. * PERF_RECORD_AUXTRACE_ERROR: * Indicates an error during AUXTRACE collection such as buffer overflow. * PERF_RECORD_FINISHED_ROUND: * Perf events are not necessarily in time stamp order, as they can be * collected in parallel on different CPUs. If the events should be * processed in time order they need to be sorted first. * Perf report guarantees that there is no reordering over a * PERF_RECORD_FINISHED_ROUND boundary event. All perf records with a * time stamp lower than this record are processed (and displayed) before * the succeeding perf record are processed. * * These records are evaluated during perf report command. * * 1. PERF_RECORD_AUXTRACE_INFO is used to set up the infrastructure for * auxiliary trace data processing. See s390_cpumsf_process_auxtrace_info() * below. * Auxiliary trace data is collected per CPU. To merge the data into the report * an auxtrace_queue is created for each CPU. It is assumed that the auxtrace * data is in ascending order. * * Each queue has a double linked list of auxtrace_buffers. This list contains * the offset and size of a CPU's auxtrace data. During auxtrace processing * the data portion is mmap()'ed. * * To sort the queues in chronological order, all queue access is controlled * by the auxtrace_heap. This is basically a stack, each stack element has two * entries, the queue number and a time stamp. However the stack is sorted by * the time stamps. The highest time stamp is at the bottom the lowest * (nearest) time stamp is at the top. That sort order is maintained at all * times! * * After the auxtrace infrastructure has been setup, the auxtrace queues are * filled with data (offset/size pairs) and the auxtrace_heap is populated. * * 2. PERF_RECORD_XXX processing triggers access to the auxtrace_queues. * Each record is handled by s390_cpumsf_process_event(). The time stamp of * the perf record is compared with the time stamp located on the auxtrace_heap * top element. If that time stamp is lower than the time stamp from the * record sample, the auxtrace queues will be processed. As auxtrace queues * control many auxtrace_buffers and each buffer can be quite large, the * auxtrace buffer might be processed only partially. In this case the * position in the auxtrace_buffer of that queue is remembered and the time * stamp of the last processed entry of the auxtrace_buffer replaces the * current auxtrace_heap top. * * 3. Auxtrace_queues might run of out data and are fed by the * PERF_RECORD_AUXTRACE handling, see s390_cpumsf_process_auxtrace_event(). * * Event Generation * Each sampling-data entry in the auxiliary trace data generates a perf sample. * This sample is filled * with data from the auxtrace such as PID/TID, instruction address, CPU state, * etc. This sample is processed with perf_session__deliver_synth_event() to * be included into the GUI. * * 4. PERF_RECORD_FINISHED_ROUND event is used to process all the remaining * auxiliary traces entries until the time stamp of this record is reached * auxtrace_heap top. This is triggered by ordered_event->deliver(). * * * Perf event processing. * Event processing of PERF_RECORD_XXX entries relies on time stamp entries. * This is the function call sequence: * * __cmd_report() * | * perf_session__process_events() * | * __perf_session__process_events() * | * perf_session__process_event() * | This functions splits the PERF_RECORD_XXX records. * | - Those generated by perf record command (type number equal or higher * | than PERF_RECORD_USER_TYPE_START) are handled by * | perf_session__process_user_event(see below) * | - Those generated by the kernel are handled by * | evlist__parse_sample_timestamp() * | * evlist__parse_sample_timestamp() * | Extract time stamp from sample data. * | * perf_session__queue_event() * | If timestamp is positive the sample is entered into an ordered_event * | list, sort order is the timestamp. The event processing is deferred until * | later (see perf_session__process_user_event()). * | Other timestamps (0 or -1) are handled immediately by * | perf_session__deliver_event(). These are events generated at start up * | of command perf record. They create PERF_RECORD_COMM and PERF_RECORD_MMAP* * | records. They are needed to create a list of running processes and its * | memory mappings and layout. They are needed at the beginning to enable * | command perf report to create process trees and memory mappings. * | * perf_session__deliver_event() * | Delivers a PERF_RECORD_XXX entry for handling. * | * auxtrace__process_event() * | The timestamp of the PERF_RECORD_XXX entry is taken to correlate with * | time stamps from the auxiliary trace buffers. This enables * | synchronization between auxiliary trace data and the events on the * | perf.data file. * | * machine__deliver_event() * | Handles the PERF_RECORD_XXX event. This depends on the record type. * It might update the process tree, update a process memory map or enter * a sample with IP and call back chain data into GUI data pool. * * * Deferred processing determined by perf_session__process_user_event() is * finally processed when a PERF_RECORD_FINISHED_ROUND is encountered. These * are generated during command perf record. * The timestamp of PERF_RECORD_FINISHED_ROUND event is taken to process all * PERF_RECORD_XXX entries stored in the ordered_event list. This list was * built up while reading the perf.data file. * Each event is now processed by calling perf_session__deliver_event(). * This enables time synchronization between the data in the perf.data file and * the data in the auxiliary trace buffers.
*/
/* Check if the raw data should be dumped to file. If this is the case and * the file to dump to has not been opened for writing, do so. * * Return 0 on success and greater zero on error so processing continues.
*/ staticint s390_cpumcf_dumpctr(struct s390_cpumsf *sf, struct perf_sample *sample)
{ struct s390_cpumsf_queue *sfq; struct auxtrace_queue *q; int rc = 0;
if (!sf->use_logfile || sf->queues.nr_queues <= sample->cpu) return rc;
q = &sf->queues.queue_array[sample->cpu];
sfq = q->priv; if (!sfq) /* Queue not yet allocated */ return rc;
if (!sfq->logfile_ctr) { char *name;
rc = (sf->logdir)
? asprintf(&name, "%s/aux.ctr.%02x",
sf->logdir, sample->cpu)
: asprintf(&name, "aux.ctr.%02x", sample->cpu); if (rc > 0)
sfq->logfile_ctr = fopen(name, "w"); if (sfq->logfile_ctr == NULL) {
pr_err("Failed to open counter set log file %s, " "continue...\n", name);
rc = 1;
}
free(name);
}
if (sfq->logfile_ctr) { /* See comment above for -4 */
size_t n = fwrite(sample->raw_data, sample->raw_size - 4, 1,
sfq->logfile_ctr); if (n != 1) {
pr_err("Failed to write counter set data\n");
rc = 1;
}
} return rc;
}
/* Display s390 CPU measurement facility basic-sampling data entry * Data written on s390 in big endian byte order and contains bit * fields across byte boundaries.
*/ staticbool s390_cpumsf_basic_show(constchar *color, size_t pos, struct hws_basic_entry *basicp)
{ struct hws_basic_entry *basic = basicp; #if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ struct hws_basic_entry local; unsignedlonglong word = be64toh(*(unsignedlonglong *)basicp);
/* Display s390 CPU measurement facility diagnostic-sampling data entry. * Data written on s390 in big endian byte order and contains bit * fields across byte boundaries.
*/ staticbool s390_cpumsf_diag_show(constchar *color, size_t pos, struct hws_diag_entry *diagp)
{ struct hws_diag_entry *diag = diagp; #if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ struct hws_diag_entry local; unsignedlonglong word = be64toh(*(unsignedlonglong *)diagp);
/* Return TOD timestamp contained in an trailer entry */ staticunsignedlonglong trailer_timestamp(struct hws_trailer_entry *te, int idx)
{ /* te->t set: TOD in STCKE format, bytes 8-15 * to->t not set: TOD in STCK format, bytes 0-7
*/ unsignedlonglong ts;
/* Test a sample data block. It must be 4KB or a multiple thereof in size and * 4KB page aligned. Each sample data page has a trailer entry at the * end which contains the sample entry data sizes. * * Return true if the sample data block passes the checks and set the * basic set entry size and diagnostic set entry size. * * Return false on failure. * * Note: Old hardware does not set the basic or diagnostic entry sizes * in the trailer entry. Use the type number instead.
*/ staticbool s390_cpumsf_validate(int machine_type, unsignedchar *buf, size_t len, unsignedshort *bsdes, unsignedshort *dsdes)
{ struct hws_basic_entry *basic = (struct hws_basic_entry *)buf; struct hws_trailer_entry *te;
*dsdes = *bsdes = 0; if (len & (S390_CPUMSF_PAGESZ - 1)) /* Illegal size */ returnfalse; if (be16toh(basic->def) != 1) /* No basic set entry, must be first */ returnfalse; /* Check for trailer entry at end of SDB */
te = (struct hws_trailer_entry *)(buf + S390_CPUMSF_PAGESZ
- sizeof(*te));
*bsdes = be16toh(te->bsdes);
*dsdes = be16toh(te->dsdes); if (!te->bsdes && !te->dsdes) { /* Very old hardware, use CPUID */ switch (machine_type) { case 2097: case 2098:
*dsdes = 64;
*bsdes = 32; break; case 2817: case 2818:
*dsdes = 74;
*bsdes = 32; break; case 2827: case 2828:
*dsdes = 85;
*bsdes = 32; break; case 2964: case 2965:
*dsdes = 112;
*bsdes = 32; break; default: /* Illegal trailer entry */ returnfalse;
}
} returntrue;
}
/* Return true if there is room for another entry */ staticbool s390_cpumsf_reached_trailer(size_t entry_sz, size_t pos)
{
size_t payload = S390_CPUMSF_PAGESZ - sizeof(struct hws_trailer_entry);
/* Correct calculation to convert time stamp in trailer entry to * nano seconds (taken from arch/s390 function tod_to_ns()). * TOD_CLOCK_BASE is stored in trailer entry member progusage2.
*/
aux_time = trailer_timestamp(te, clock_base) - progusage2;
aux_time = (aux_time >> 9) * 125 + (((aux_time & 0x1ff) * 125) >> 9); return aux_time;
}
/* Process the data samples of a single queue. The first parameter is a * pointer to the queue, the second parameter is the time stamp. This * is the time stamp: * - of the event that triggered this processing. * - or the time stamp when the last processing of this queue stopped. * In this case it stopped at a 4KB page boundary and record the * position on where to continue processing on the next invocation * (see buffer->use_data and buffer->use_size). * * When this function returns the second parameter is updated to * reflect the time stamp of the last processed auxiliary data entry * (taken from the trailer entry of that page). The caller uses this * returned time stamp to record the last processed entry in this * queue. * * The function returns: * 0: Processing successful. The second parameter returns the * time stamp from the trailer entry until which position * processing took place. Subsequent calls resume from this * position. * <0: An error occurred during processing. The second parameter * returns the maximum time stamp. * >0: Done on this queue. The second parameter returns the * maximum time stamp.
*/ staticint s390_cpumsf_samples(struct s390_cpumsf_queue *sfq, u64 *ts)
{ struct s390_cpumsf *sf = sfq->sf; unsignedchar *buf = sfq->buffer->use_data;
size_t len = sfq->buffer->use_size; struct hws_basic_entry *basic; unsignedshort bsdes, dsdes;
size_t pos = 0; int err = 1;
u64 aux_ts;
/* Get trailer entry time stamp and check if entries in * this auxiliary page are ready for processing. If the * time stamp of the first entry is too high, whole buffer * can be skipped. In this case return time stamp.
*/
aux_ts = get_trailer_time(buf); if (!aux_ts) {
pr_err("[%#08" PRIx64 "] Invalid AUX trailer entry TOD clock base\n",
(s64)sfq->buffer->data_offset);
aux_ts = ~0ULL; goto out;
} if (aux_ts > *ts) {
*ts = aux_ts; return 0;
}
/* Check for trailer entry */ if (!s390_cpumsf_reached_trailer(bsdes + dsdes, pos)) {
pos = (pos + S390_CPUMSF_PAGESZ)
& ~(S390_CPUMSF_PAGESZ - 1); /* Check existence of next page */ if (pos >= len) break;
aux_ts = get_trailer_time(buf + pos); if (!aux_ts) {
aux_ts = ~0ULL; goto out;
} if (aux_ts > *ts) {
*ts = aux_ts;
sfq->buffer->use_data += pos;
sfq->buffer->use_size -= pos; return 0;
}
}
}
out:
*ts = aux_ts;
sfq->buffer->use_size = 0;
sfq->buffer->use_data = NULL; return err; /* Buffer completely scanned or error */
}
/* Run the s390 auxiliary trace decoder. * Select the queue buffer to operate on, the caller already selected * the proper queue, depending on second parameter 'ts'. * This is the time stamp until which the auxiliary entries should * be processed. This value is updated by called functions and * returned to the caller. * * Resume processing in the current buffer. If there is no buffer * get a new buffer from the queue and setup start position for * processing. * When a buffer is completely processed remove it from the queue * before returning. * * This function returns * 1: When the queue is empty. Second parameter will be set to * maximum time stamp. * 0: Normal processing done. * <0: Error during queue buffer setup. This causes the caller * to stop processing completely.
*/ staticint s390_cpumsf_run_decoder(struct s390_cpumsf_queue *sfq,
u64 *ts)
{
struct auxtrace_buffer *buffer; struct auxtrace_queue *queue; int err;
/* Get buffer and last position in buffer to resume * decoding the auxiliary entries. One buffer might be large * and decoding might stop in between. This depends on the time * stamp of the trailer entry in each page of the auxiliary * data and the time stamp of the event triggering the decoding.
*/ if (sfq->buffer == NULL) {
sfq->buffer = buffer = auxtrace_buffer__next(queue,
sfq->buffer); if (!buffer) {
*ts = ~0ULL; return 1; /* Processing done on this queue */
} /* Start with a new buffer on this queue */ if (buffer->data) {
buffer->use_size = buffer->size;
buffer->use_data = buffer->data;
} if (sfq->logfile) { /* Write into log file */
size_t rc = fwrite(buffer->data, buffer->size, 1,
sfq->logfile); if (rc != 1)
pr_err("Failed to write auxiliary data\n");
}
} else
buffer = sfq->buffer;
if (!buffer->data) { int fd = perf_data__fd(sfq->sf->session->data);
/* If non-zero, there is either an error (err < 0) or the buffer is * completely done (err > 0). The error is unrecoverable, usually * some descriptors could not be read successfully, so continue with * the next buffer. * In both cases the parameter 'ts' has been updated.
*/ if (err) {
sfq->buffer = NULL;
list_del_init(&buffer->list);
auxtrace_buffer__free(buffer); if (err > 0) /* Buffer done, no error */
err = 0;
} return err;
}
/* Dump here after copying piped trace out of the pipe */ if (dump_trace) { if (auxtrace_buffer__get_data(buffer, fd)) {
s390_cpumsf_dump_event(sf, buffer->data,
buffer->size);
auxtrace_buffer__put_data(buffer);
}
} return 0;
}
staticint s390_cpumsf_get_type(constchar *cpuid)
{ int ret, family = 0;
ret = sscanf(cpuid, "%*[^,],%u", &family); return (ret == 1) ? family : 0;
}
/* Check itrace options set on perf report command. * Return true, if none are set or all options specified can be * handled on s390 (currently only option 'd' for logging. * Return false otherwise.
*/ staticbool check_auxtrace_itrace(struct itrace_synth_opts *itops)
{ bool ison = false;
/* Check for AUXTRACE dump directory if it is needed. * On failure print an error message but continue. * Return 0 on wrong keyword in config file and 1 otherwise.
*/ staticint s390_cpumsf__config(constchar *var, constchar *value, void *cb)
{ struct s390_cpumsf *sf = cb; struct stat stbuf; int rc;
if (strcmp(var, "auxtrace.dumpdir")) return 0;
sf->logdir = strdup(value); if (sf->logdir == NULL) {
pr_err("Failed to find auxtrace log directory %s," " continue with current directory...\n", value); return 1;
}
rc = stat(sf->logdir, &stbuf); if (rc == -1 || !S_ISDIR(stbuf.st_mode)) {
pr_err("Missing auxtrace log directory %s," " continue with current directory...\n", value);
zfree(&sf->logdir);
} return 1;
}
int s390_cpumsf_process_auxtrace_info(union perf_event *event, struct perf_session *session)
{ struct perf_record_auxtrace_info *auxtrace_info = &event->auxtrace_info; struct s390_cpumsf *sf; int err;
if (auxtrace_info->header.size < sizeof(struct perf_record_auxtrace_info)) return -EINVAL;
sf = zalloc(sizeof(struct s390_cpumsf)); if (sf == NULL) return -ENOMEM;
if (!check_auxtrace_itrace(session->itrace_synth_opts)) {
err = -EINVAL; goto err_free;
}
sf->use_logfile = session->itrace_synth_opts->log; if (sf->use_logfile)
perf_config(s390_cpumsf__config, sf);
err = auxtrace_queues__init(&sf->queues); if (err) goto err_free;
sf->session = session;
sf->machine = &session->machines.host; /* No kvm support */
sf->auxtrace_type = auxtrace_info->type;
sf->pmu_type = PERF_TYPE_RAW;
sf->machine_type = s390_cpumsf_get_type(perf_session__env(session)->cpuid);
Die Informationen auf dieser Webseite wurden
nach bestem Wissen sorgfältig zusammengestellt. Es wird jedoch weder Vollständigkeit, noch Richtigkeit,
noch Qualität der bereit gestellten Informationen zugesichert.
Bemerkung:
Die farbliche Syntaxdarstellung und die Messung sind noch experimentell.