trappy: Speed up trappy by caching trace parsing (!247) · Merge requests · Tooling / trappy

Darryl Green requested to merge github/fork/joelagnel/for-trappy-cache into master May 15, 2017

Created by: joelagnel

Pandas is extremely fast at parsing csv to data frames. Astonishingly it takes < 1s to serialize/deserialize a 100MB work of traces with 430000 events to/from csv. We leverage this and write out a data frames into a csv file when they are created for the first time. Next time we read it out if it exists. To make sure, the cache isn't stale, we take the md5sum of the trace file and also ensure all CSVs exist before reading from the cache. I get a speed up of 16s to 1s when parsing a 100MB trace.

Signed-off-by: Joel Fernandes joelaf@google.com

trappy: Speed up trappy by caching trace parsing

Merge request reports