Tracing

Tracing can be enabled by setting spark.comet.tracing.enabled=true.

With this feature enabled, each Spark executor will write a JSON event log file in Chrome’s Trace Event Format. The file will be written to the executor’s current working directory with the filename comet-event-trace.json.

Additionally, enabling the jemalloc feature will enable tracing of native memory allocations.

make release COMET_FEATURES="jemalloc"

Example output:

{ "name": "decodeShuffleBlock", "cat": "PERF", "ph": "B", "pid": 1, "tid": 5, "ts": 10109225730 },
{ "name": "decodeShuffleBlock", "cat": "PERF", "ph": "E", "pid": 1, "tid": 5, "ts": 10109228835 },
{ "name": "decodeShuffleBlock", "cat": "PERF", "ph": "B", "pid": 1, "tid": 5, "ts": 10109245928 },
{ "name": "decodeShuffleBlock", "cat": "PERF", "ph": "E", "pid": 1, "tid": 5, "ts": 10109248843 },
{ "name": "execute_plan", "cat": "PERF", "ph": "E", "pid": 1, "tid": 5, "ts": 10109350935 },
{ "name": "CometExecIterator_getNextBatch", "cat": "PERF", "ph": "E", "pid": 1, "tid": 5, "ts": 10109367116 },
{ "name": "CometExecIterator_getNextBatch", "cat": "PERF", "ph": "B", "pid": 1, "tid": 5, "ts": 10109479156 },

Traces can be viewed with Trace Viewer.

Example trace visualization:

tracing

Definition of Labels

Label

Meaning

jvm_heapUsed

JVM heap memory usage of live objects for the executor process

jemalloc_allocated

Native memory usage for the executor process

task_memory_comet_NNN

Off-heap memory allocated by Comet for query execution

task_memory_spark_NNN

On-heap & Off-heap memory allocated by Spark

comet_shuffle_NNN

Off-heap memory allocated by Comet for columnar shuffle