\chapter{Runtime detection of data races with ThreadSanitizer} \label{c:tsan}\cutname{tsan.html} %HEVEA\cutname{tsan.html} \section{s:tsan-overview}{Overview and usage} OCaml, since version 5.0, allows shared-memory parallelism and thus mutation of data shared between multiple threads. This creates the possibility of data races, i.e., unordered accesses to the same memory location with at least one of them being a write. In OCaml, data races are easy to introduce, and the behaviour of programs with data races can be unintuitive — the observed behaviours cannot be explained by simply interleaving operations from different concurrent threads. More information about data races and their consequences can be found in section~\ref{s:par_mm_easy} and Chapter~\ref{c:memorymodel}. To help detect data races, OCaml supports ThreadSantizer (TSan), a dynamic data race detector that has been successfully used in languages such as C/C++, Swift, etc. TSan support for OCaml is available since OCaml 5.2. To use TSan, you must configure the compiler with \texttt{--enable-tsan}. You can also install an \texttt{opam} switch with the TSan feature enabled as follows: \begin{verbatim} opam switch create ocaml-option-tsan \end{verbatim} TSan support for OCaml is currently available for the x86_64 architecture, on FreeBSD, Linux and macOS, for the arm64 architecture on Linux and macOS, and for the POWER, riscv and s390x architectures on Linux. Building OCaml with TSan support requires GCC or Clang. Minimal supported versions are GCC 11 and Clang 14. Note that TSan data race reports with GCC 11 are known to result in poor stack trace reporting (no line numbers), which is fixed in GCC 12. A TSan-enabled compiler differs from a regular compiler in the following way: all programs compiled by \texttt{ocamlopt} are instrumented with calls to the TSan runtime, and TSan will detect data races encountered during execution. For instance, consider the following program: \begin{caml_example*}{verbatim} let a = ref 0 and b = ref 0 let d1 () = a := 1; !b let d2 () = b := 1; !a let () = let h = Domain.spawn d2 in let r1 = d1 () in let r2 = Domain.join h in assert (not (r1 = 0 && r2 = 0)) \end{caml_example*} This program has data races. The memory locations \texttt{a} and \texttt{b} are read and written concurrently by multiple domains \texttt{d1} and \texttt{d2}. \texttt{a} and \texttt{b} are ``non-atomic'' locations according to the memory model (see Chapter~\ref{c:memorymodel}), and there is no synchronization between accesses to them. Hence, there are two data races here corresponding to the two memory locations \texttt{a} and \texttt{b}. When you compile and run this program with \texttt{ocamlopt}, you may observe data race reports on the standard error, such as: \begin{verbatim} ================== WARNING: ThreadSanitizer: data race (pid=3808831) Write of size 8 at 0x8febe0 by thread T1 (mutexes: write M90): #0 camlSimple_race.d2_274 simple_race.ml:8 (simple_race.exe+0x420a72) #1 camlDomain.body_706 stdlib/domain.ml:211 (simple_race.exe+0x440f2f) #2 caml_start_program (simple_race.exe+0x47cf37) #3 caml_callback_exn runtime/callback.c:197 (simple_race.exe+0x445f7b) #4 domain_thread_func runtime/domain.c:1167 (simple_race.exe+0x44a113) Previous read of size 8 at 0x8febe0 by main thread (mutexes: write M86): #0 camlSimple_race.d1_271 simple_race.ml:5 (simple_race.exe+0x420a22) #1 camlSimple_race.entry simple_race.ml:13 (simple_race.exe+0x420d16) #2 caml_program (simple_race.exe+0x41ffb9) #3 caml_start_program (simple_race.exe+0x47cf37) [...] WARNING: ThreadSanitizer: data race (pid=3808831) Read of size 8 at 0x8febf0 by thread T1 (mutexes: write M90): #0 camlSimple_race.d2_274 simple_race.ml:9 (simple_race.exe+0x420a92) #1 camlDomain.body_706 stdlib/domain.ml:211 (simple_race.exe+0x440f2f) #2 caml_start_program (simple_race.exe+0x47cf37) #3 caml_callback_exn runtime/callback.c:197 (simple_race.exe+0x445f7b) #4 domain_thread_func runtime/domain.c:1167 (simple_race.exe+0x44a113) Previous write of size 8 at 0x8febf0 by main thread (mutexes: write M86): #0 camlSimple_race.d1_271 simple_race.ml:4 (simple_race.exe+0x420a01) #1 camlSimple_race.entry simple_race.ml:13 (simple_race.exe+0x420d16) #2 caml_program (simple_race.exe+0x41ffb9) #3 caml_start_program (simple_race.exe+0x47cf37) [...] ================== ThreadSanitizer: reported 2 warnings \end{verbatim} For each detected data race, TSan reports the location of the conflicting accesses, their nature (read, write, atomic read, etc.), and the associated stack trace. If you run the above program several times, the output may vary: sometimes TSan will report two data races, sometimes one, and sometimes none. This is due to the combination of two factors: \begin{itemize} \item First, TSan reports only the data races encountered during execution, i.e., conflicting, unordered memory accesses that are effectively performed. \item In addition, in this program, depending on executions, there may be no such memory accesses: if \texttt{d1} returns before \texttt{d2} has finished spawning, then all memory accesses originating from \texttt{d1} may happen-before the ones originating from \texttt{d2}, since spawning a domain involves inter-thread synchronization. In that case, the accesses are considered to be ordered and no data race is reported. \end{itemize} This example illustrates the fact that data races can sometimes be hidden by unrelated synchronizing operations. \section{s:tsan-performance}{Performance implications} TSan instrumentation imposes a non-negligible cost at runtime. Empirically, this cost has been observed to cause a slowdown which can range from 2x to 7x. One of the main factors of high slowdowns is frequent access to mutable data. In contrast, the initialising writes to and reads from immutable memory locations are not instrumented. TSan also allocates very large amounts of virtual memory, although it uses only a fraction of it. The memory consumption is increased by a factor between 4 and 7. \section{s:tsan-false-neg-false-pos}{False negatives and false positives} As illustrated by the previous example, TSan will only report the data races encountered during execution. Another important caveat is that TSan remembers only a finite number of memory accesses per memory location. At the time of writing, this number is 4. Data races involving a forgotten access will not be detected. Lastly, the \href{https://github.com/google/sanitizers/wiki/ThreadSanitizerAlgorithm}{documentation of TSan} states that there is a tiny probability to miss a race if two threads access the same location at the same time. TSan may overlook data races only in these three specific cases. For data races between two memory accesses made from OCaml code, TSan does not produce false positives; that is, TSan will not emit spurious reports. When mixing OCaml and C code, through the use of C primitives, the very notion of false positive becomes less clear, as it involves two memory models -- OCaml and C11. However, TSan should behave mostly as one would expect: non-atomic reads and writes in C will race with non-atomic reads and writes in OCaml, and C atomics will not race with OCaml atomics. There is one theoretical possibility of false positive: if a \texttt{value} is initialized from C without using \texttt{caml_initialize} (which is allowed under the condition that the GC does not run between the allocation and the write, see Chapter~\ref{c:intf-c}) and a conflicting access is made later by another thread. This does not constitute a data race, but TSan may report it as such. \section{s:tsan-runtime-flags}{Runtime options} TSan supports a number of configuration options at runtime using the \texttt{TSAN\_OPTIONS} environment variable. \texttt{TSAN\_OPTIONS} should contain one or more options separated by spaces. See the \href{https://github.com/google/sanitizers/wiki/ThreadSanitizerFlags}{documentation of TSan flags} and the \href{https://github.com/google/sanitizers/wiki/SanitizerCommonFlags}{documentation of flags common to all sanitizers} for more information. Notably, \texttt{TSAN\_OPTIONS} allows suppressing some data races from TSan reports. Suppressing data race reports is useful for intentional races or libraries that cannot be fixed. For example, to suppress reports originating from functions in the OCaml module \texttt{My_module}, one can run \begin{verbatim} TSAN_OPTIONS="suppressions=suppr.txt" ./my_instrumented_program \end{verbatim} with \texttt{suppr.txt} a file containing: \begin{verbatim} race_top:^camlMy_module \end{verbatim} (Note that this depends on the format of OCaml symbols in the executable. Some builders, like Dune, might result in different formats. You should adapt this example to the symbols effectively present in your stack traces.) The \texttt{TSAN\_OPTIONS} variable also allows for increasing the ``history size'', e.g.: \begin{verbatim} TSAN_OPTIONS="history_size=7" ./my_instrumented_program \end{verbatim} TSan’s history records events such as function entry and exit, and is used to reconstruct stack traces. Increasing the history size can sometimes be necessary to obtain the second stack trace, but it also increases memory consumption. This setting does not change the number of memory accesses remembered per memory location. Another useful runtime option is \texttt{exitcode=0}, which still reports data races but does not change the exit code. This can be useful if TSan complains about data races in programs that you don't care about and the non-zero exit code disturbs your workflow. \section{s:tsan-c-code}{Guidelines for linking} As a general rule, OCaml programs instrumented with TSan should only be linked with OCaml or C objects also instrumented with TSan. Doing otherwise may result in crashes. The only exception to this rule are C libraries that do not call into the OCaml runtime system in any way, i.e., do not allocate, raise exceptions, call back into OCaml code, etc. Examples include the libc or system libraries. Data races in non-instrumented libraries will not be reported. C code interacting with OCaml should always be built through the \texttt{ocamlopt} command, which will pass the required instrumentation flags to the C compiler. The \texttt{CAMLno_tsan} qualifier can be used to prevent functions from being instrumented: \begin{verbatim} CAMLno_tsan void f(int arg) { /* This function will not be instrumented. */ ... } \end{verbatim} Races from non-instrumented functions will not be reported. \texttt{CAMLno_tsan} should only be used by experts. It can be used to reduce the performance overhead in certain corner cases, or to suppress some known alarms. For the latter, using a suppressions file with \texttt{TSAN\_OPTIONS} should be preferred when possible, as it allows for finer-grained control, and qualifying a function \texttt{f} with \texttt{CAMLno_tsan} results in missing entries in TSan’s stack traces when a data race happens in a transitive callee of \texttt{f}. There is no way to disable instrumentation in OCaml code. \section{s:tsan-signal-changes}{Changes in the delivery of signals} TSan intercepts all signals and passes them down to the instrumented program. This overlay from TSan is not always transparent for the program. Synchronous signals such as \texttt{SIGSEV}, \texttt{SIGILL}, \texttt{SIGBUS}, etc.\ will be passed down immediately, whereas asynchronous signals such as \texttt{SIGINT} will be delayed until the next call to the TSan runtime (e.g.\ until the next access to mutable data). This limitation of TSan can have surprising effects: for instance, pure, recursive functions that do not allocate cannot be interrupted until they terminate.