Skip to content

trappy: Speed up trappy by caching trace parsing

Darryl Green requested to merge github/fork/joelagnel/for-trappy-cache into master

Created by: joelagnel

Pandas is extremely fast at parsing csv to data frames. Astonishingly it takes < 1s to serialize/deserialize a 100MB work of traces with 430000 events to/from csv. We leverage this and write out a data frames into a csv file when they are created for the first time. Next time we read it out if it exists. To make sure, the cache isn't stale, we take the md5sum of the trace file and also ensure all CSVs exist before reading from the cache. I get a speed up of 16s to 1s when parsing a 100MB trace.

Signed-off-by: Joel Fernandes joelaf@google.com

Merge request reports