Comet Debugging Guide¶
This HOWTO describes how to debug JVM code and Native code concurrently. The guide assumes you have:
IntelliJ as the Java IDE
CLion as the Native IDE. For Rust code, the CLion Rust language plugin is required. Note that the IntelliJ Rust plugin is not sufficient.
CLion/LLDB as the native debugger. CLion ships with a bundled LLDB and the Rust community has its own packaging of LLDB (
lldb-rust
). Both provide a better display of Rust symbols than plain LLDB or the LLDB that is bundled with XCode. We will use the LLDB packaged with CLion for this guide.We will use a Comet unit test as the canonical use case.
Caveat: The steps here have only been tested with JDK 11 on Mac (M1)
Debugging for Advanced Developers¶
Add a .lldbinit
to comet/core. This is not strictly necessary but will be useful if you want to
use advanced lldb
debugging.
In IntelliJ¶
Set a breakpoint in
NativeBase.load()
, at a point after the Comet library has been loaded.Add a Debug Configuration for the unit test
In the Debug Configuration for that unit test add
-Xint
as a JVM parameter. This option is undocumented magic. Without this, the LLDB debugger hits a EXC_BAD_ACCESS (or EXC_BAD_INSTRUCTION) from which one cannot recover.Add a println to the unit test to print the PID of the JVM process. (jps can also be used but this is less error prone if you have multiple jvm processes running)
println("Waiting for Debugger: PID - ", ManagementFactory.getRuntimeMXBean().getName())
This will print something like :
PID@your_machine_name
.For JDK9 and newer
println("Waiting for Debugger: PID - ", ProcessHandle.current.pid)
==> Note the PID
Debug-run the test in IntelliJ and wait for the breakpoint to be hit
In CLion¶
After the breakpoint is hit in IntelliJ, in Clion (or LLDB from terminal or editor) -
Attach to the jvm process (make sure the PID matches). In CLion, this is
Run -> Atttach to process
Put your breakpoint in the native code
Go back to IntelliJ and resume the process.
Most debugging in CLion is similar to IntelliJ. For advanced LLDB based debugging the LLDB command line can be accessed from the LLDB tab in the Debugger view. Refer to the LLDB manual for LLDB commands.
After your debugging is done¶
In CLion, detach from the process if not already detached
In IntelliJ, the debugger might have lost track of the process. If so, the debugger tab will show the process as running (even if the test/job is shown as completed).
Close the debugger tab, and if the IDS asks whether it should terminate the process, click Yes.
In terminal, use jps to identify the process with the process id you were debugging. If it shows up as running, kill -9 [pid]. If that doesn’t remove the process, don’t bother, the process will be left behind as a zombie and will consume no (significant) resources. Eventually it will be cleaned up when you reboot possibly after a software update.
Additional Info¶
OpenJDK mailing list on debugging the JDK on MacOS https://mail.openjdk.org/pipermail/hotspot-dev/2019-September/039429.html
Detecting the debugger https://stackoverflow.com/questions/5393403/can-a-java-application-detect-that-a-debugger-is-attached#:~:text=No.,to let your app continue.&text=I know that those are,meant with my first phrase).
Verbose debug¶
By default, Comet outputs the exception details specific for Comet.
scala> spark.sql("my_failing_query").show(false)
24/03/05 17:00:07 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)/ 1]
org.apache.comet.CometNativeException: Internal error: MIN/MAX is not expected to receive scalars of incompatible types (Date32("NULL"), Int32(15901)).
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker
at org.apache.comet.Native.executePlan(Native Method)
at org.apache.comet.CometExecIterator.executeNative(CometExecIterator.scala:65)
at org.apache.comet.CometExecIterator.getNextBatch(CometExecIterator.scala:111)
at org.apache.comet.CometExecIterator.hasNext(CometExecIterator.scala:126)
There is a verbose exception option by leveraging DataFusion backtraces
This option allows to append native DataFusion stack trace to the original error message.
To enable this option with Comet it is needed to include backtrace
feature in Cargo.toml for DataFusion dependencies
datafusion-common = { version = "36.0.0", features = ["backtrace"] }
datafusion = { default-features = false, version = "36.0.0", features = ["unicode_expressions", "backtrace"] }
Then build the Comet as described
Start Comet with RUST_BACKTRACE=1
RUST_BACKTRACE=1 $SPARK_HOME/spark-shell --jars spark/target/comet-spark-spark3.4_2.12-0.5.0-SNAPSHOT.jar --conf spark.plugins=org.apache.spark.CometPlugin --conf spark.comet.enabled=true --conf spark.comet.exec.enabled=true
Get the expanded exception details
scala> spark.sql("my_failing_query").show(false)
24/03/05 17:00:49 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
org.apache.comet.CometNativeException: Internal error: MIN/MAX is not expected to receive scalars of incompatible types (Date32("NULL"), Int32(15901))
backtrace:
0: std::backtrace::Backtrace::create
1: datafusion_physical_expr::aggregate::min_max::min
2: <datafusion_physical_expr::aggregate::min_max::MinAccumulator as datafusion_expr::accumulator::Accumulator>::update_batch
3: <futures_util::stream::stream::fuse::Fuse<S> as futures_core::stream::Stream>::poll_next
4: comet::execution::jni_api::Java_org_apache_comet_Native_executePlan::{{closure}}
5: _Java_org_apache_comet_Native_executePlan
(reduced)
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker
at org.apache.comet.Native.executePlan(Native Method)
at org.apache.comet.CometExecIterator.executeNative(CometExecIterator.scala:65)
at org.apache.comet.CometExecIterator.getNextBatch(CometExecIterator.scala:111)
at org.apache.comet.CometExecIterator.hasNext(CometExecIterator.scala:126)
(reduced)
Note:
The backtrace coverage in DataFusion is still improving. So there is a chance the error still not covered, if so feel free to file a ticket
The backtrace evaluation comes with performance cost and intended mostly for debugging purposes