Skip to content
  • James Clark's avatar
    perf cs-etm: Split Coresight decode by aux records · 83d1fc92
    James Clark authored
    
    
    Populate the auxtrace queues using AUX records rather than whole
    auxtrace buffers so that the decoder is reset between each aux record.
    
    This is similar to the auxtrace_queues__process_index() ->
    auxtrace_queues__add_indexed_event() flow where
    perf_session__peek_event() is used to read AUXTRACE events out of random
    positions in the file based on the auxtrace index.
    
    But now we loop over all PERF_RECORD_AUX events instead of AUXTRACE
    buffers. For each PERF_RECORD_AUX event, we find the corresponding
    AUXTRACE buffer using the index, and add a fragment of that buffer to
    the auxtrace queues.
    
    No other changes to decoding were made, apart from populating the
    auxtrace queues. The result of decoding is identical to before, except
    in cases where decoding failed completely, due to not resetting the
    decoder.
    
    The reason for this change is because AUX records are emitted any time
    tracing is disabled, for example when the process is scheduled out.
    Because ETM was disabled and enabled again, the decoder also needs to be
    reset to force the search for a sync packet. Otherwise there would be
    fatal decoding errors.
    
    Testing
    =======
    
    Testing was done with the following script, to diff the decoding results
    between the patched and un-patched versions of perf:
    
    	#!/bin/bash
    	set -ex
    
    	$1 script -i $3 $4 > split.script
    	$2 script -i $3 $4 > default.script
    
    	diff split.script default.script | head -n 20
    
    And it was run like this, with various itrace options depending on the
    quantity of synthesised events:
    
    	compare.sh ./perf-patched ./perf-default perf-per-cpu-2-threads.data --itrace=i100000ns
    
    No changes in output were observed in the following scenarios:
    
    * Simple per-cpu
    	perf record -e cs_etm/@tmc_etr0/u top
    
    * Per-thread, single thread
    	perf record -e cs_etm/@tmc_etr0/u --per-thread ./threads_C
    
    * Per-thread multiple threads (but only one thread collected data):
    	perf record -e cs_etm/@tmc_etr0/u --per-thread --pid 4596,4597
    
    * Per-thread multiple threads (both threads collected data):
    	perf record -e cs_etm/@tmc_etr0/u --per-thread --pid 4596,4597
    
    * Per-cpu explicit threads:
    	perf record -e cs_etm/@tmc_etr0/u --pid 853,854
    
    * System-wide (per-cpu):
        perf record -e cs_etm/@tmc_etr0/u -a
    
    * No data collected (no aux buffers)
    	Can happen with any command when run for a short period
    
    * Containing truncated records
    	Can happen with any command
    
    * Containing aux records with 0 size
    	Can happen with any command
    
    * Snapshot mode (various files with and without buffer wrap)
    	perf record -e cs_etm/@tmc_etr0/u -a --snapshot
    
    Some differences were observed in the following scenario:
    
    * Snapshot mode (with duplicate buffers)
    	perf record -e cs_etm/@tmc_etr0/u -a --snapshot
    
    Fewer samples are generated in snapshot mode if duplicate buffers
    were gathered because buffers with the same offset are now only added
    once. This gives different, but more correct results and no duplicate
    data is decoded any more.
    
    Signed-off-by: default avatarJames Clark <james.clark@arm.com>
    Reviewed-by: default avatarMathieu Poirier <mathieu.poirier@linaro.org>
    Tested-by: default avatarLeo Yan <leo.yan@linaro.org>
    Cc: Al Grant <al.grant@arm.com>
    Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
    Cc: Anshuman Khandual <anshuman.khandual@arm.com>
    Cc: Branislav Rankov <branislav.rankov@arm.com>
    Cc: Denis Nikitin <denik@chromium.org>
    Cc: Jiri Olsa <jolsa@redhat.com>
    Cc: John Garry <john.garry@huawei.com>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Mike Leach <mike.leach@linaro.org>
    Cc: Namhyung Kim <namhyung@kernel.org>
    Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
    Cc: Will Deacon <will@kernel.org>
    Cc: coresight@lists.linaro.org
    Cc: linux-arm-kernel@lists.infradead.org
    Link: http://lore.kernel.org/lkml/20210624164303.28632-2-james.clark@arm.com
    
    
    Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
    83d1fc92