Upstream "mark-delay" change from flambda-backend. (details)
Hack to work around accounting problem: artificially catch-up work_counter at the start of any slice when it falls very far behind alloc_counter. (details)
Previously, the flexdll support objects were placed in both byte/bin and opt/bin with the copy of flexlink.exe when flexlink was being bootstrapped with OCaml. The objects are small, so the copying was not particulary onerous.
However, if opt/bin/flexlink.exe is a native Windows symlink (pointing to ../../flexlink.opt.exe) then Sys.executable_name when flexlink runs will point to the wrong place. While flexlink ought to be checking Sys.argv.(0) rather than Sys.executable_name, a better hardening is to be explicit and set the FLEXDIR environment variable to point to the directory containing the support objects. This also allows byte/bin/flexlink.exe and opt/bin/flexlink.exe to share the same copy of the objects.
Refactor [is_functor_arg] table of env into [not_aliasable]
- In [env.ml], renamed the internal table used to track non aliasable modules from [Env_functor_arg] to [Env_not_aliasable], renamed the associated test function from [is_functor_arg] to [is_aliasable], renamed the [~arg] flags of some functions into [~noalias]
- In [includemod.ml], removed the redundant [can_alias] function
- In [typemod.ml], changed the error message for [Cannot_alias]
Tests a full `--disable-shared` build on Linux and also a Linux build with as many options disabled as possible (as the minimal build in the other-configs job on Jenkins also does).
The matrix is expanded by adding the 'CI: Full matrix' label to a pull request.
If Cygwin is running "elevated" - which it is in CI - then it acts as though it's running as root. It intentionally activates SeBackupPrivilege, which thwarts the test_create_cursor_failures.ml test.
The OCaml testsuite will never require root privileges for anything meaningful, so ocamltest on Cygwin simply drops the SeBackupPrivilege when running on Cygwin, which means the test correctly fails.
- Typedtrees are no longer built inside [merge_constraint] but inside [transl_with], which removes the need for a special approximation case: merging always returns a module type, not a Typedtree. Changed [transl_with] to build the Typedtree there.
- Removed the [real_ids] mechanism that was used to store (imperatively) the list of affected paths. Now the patches (defined by [return], [return_payload] and [return_paths]) store both the resulting path and the list of affected paths.
- Added a [payload] mechanism used only for the type constraint case, where the replacement declaration is returned as an additional payload. Other cases return [None].
- Create separate functions ([merge_type], [merge_module], [merge_modtype])
- Extracted the post processing (wellformedness checks and substitutions) into an helper function [post_process]
[refactor merge] Extracted the recursive functions for deep constr
- Broke down the main merging function into three parts: [merge_signature], [patch_deep_item] and [patch_all]. The first two are mutually recursive and use an extra argument [~patch]. For now, the only patch provided (in [merge_type], [merge_module], etc) is [patch_all].
- Moved the patching logic from [patch_all] to a specialized patch function in [merge_module]. Merged the common parts of the destructive and non-destructive cases
[breaking] Change the prototype of [caml_atomic_cas_field].
This is a breaking change because this function was (unfortunately) exposed outside CAML_INTERNALS, and is used by exactly one external user, you guessed it: https://github.com/ocaml-multicore/multicore-magic/blob/360c2e829c9addeca9ccaee1c71f4ad36bb14a79/src/Multicore_magic.mli#L181-L185 https://github.com/ocaml-multicore/multicore-magic/blob/360c2e829c9addeca9ccaee1c71f4ad36bb14a79/src/unboxed5/multicore_magic_atomic_array.ml#L36-L43
We chose to change the prototype to remain consistent with the naming convention for the new caml_atomic_*_field primitives, which will be added to support atomic record fields.
User code can easily adapt to this new prototype we are using, but not in a way that is compatible with both old and new versions of OCaml (not without some preprocessing at least).
Another option would be to expose
int caml_atomic_cas_field(value obj, intnat fld, value, value) value caml_atomic_cas_field_boxed(value obj, value vfld, value, value)
but no other group of primitives in the runtime uses this _boxed terminology, they instead use
int caml_atomic_cas_field_unboxed(value obj, intnat fld, value, value) value caml_atomic_cas_field(value obj, value vfld, value, value)
and this would again break compatiblity -- it is not easier to convert code to that two-version proposal, and not noticeably more efficient.
So in this case we decided to break compatibility (of an obscure, experimental, undocumented but exposed feature) in favor of consistency and simplificity of the result.
Uses of existing atomic primitives %atomic_foo, which act on single-field references, are now translated into %atomic_foo_field, which act on a pointer and an offset -- passed as separate arguments.
In particular, note that the arity of the internal Lambda primitive Patomic_load increases by one with this patchset. (Initially we renamed it into Patomic_load_field but this creates a lot of churn for no clear benefits.)
We also support primitives of the form %atomic_foo_loc, which expects a pair of a pointer and an offset (as a single argument), as we proposed in the RFC on atomic fields https://github.com/ocaml/RFCs/pull/39 (but there is no language-level support for atomic record fields yet)
To reproduce (see BOOTSTRAP.adoc for details): - go to the earlier commit "lambda: Add support for new atomic primitives" as a known-good state, build that one - keep the build artifacts around, come here and do `make bootstrap`
This bootstrap is not required by a compiler change, but it enables the use of the predefined type `'a atomic_loc` and the expression-former [%atomic.loc ...] in the standard library.
Runtime events: dispatch the right event message type (#13970)
* Dispatch the right event message type
`type.runtime | type.user` does not really make sense as `.runtime` and `.user` are two projections of the same union type.
Prevents a MSVC 19.44.35109.1 warning:
runtime/runtime_events.c(595): warning C5287: operands are different enum types 'ev_runtime_message_type' and 'ev_user_message_type'; use an explicit cast to silence this warning
Co-authored-by: Gabriel Scherer <gabriel.scherer@gmail.com>
* Fit in 80 cols
---------
Co-authored-by: Gabriel Scherer <gabriel.scherer@gmail.com>
yacc/reader.c:62:1: error: initializer-string for character array is too long, array size is 32 but initializer has size 33 (including the null terminating character); did you mean to use the 'nonstring' attribute? [-Werror,-Wunterminated-string-initialization] 62 | "\000\000\000\000\000\000\000\000\376\377\377\207\376\377\377\007\000\000\000\000\000\000\000\000\377\377\177\377\377\377\177\377"; | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ yacc/reader.c:64:1: error: initializer-string for character array is too long, array size is 32 but initializer has size 33 (including the null terminating character); did you mean to use the 'nonstring' attribute? [-Werror,-Wunterminated-string-initialization] 64 | "\000\000\000\000\200\000\377\003\376\377\377\207\376\377\377\007\000\000\000\000\000\000\000\000\377\377\177\377\377\377\177\377"; | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
caml_executable_name is always called in native startup and for all the non-default bytecode linking mechanisms. Bytecode startup now always calls caml_executable_name, and this value is stored along with exe_name.
caml_sys_proc_self_exe returns this stored value as a string option. It returns None if caml_executable_name is not implemented on a given platform.
In native mode, same as Sys.executable_name, in bytecode, the path to the interpreter executing Sys.executable_name, which may not be the same from the same file.
Better error messages on invalid recursive module definitions
Partially addresses issue of ambiguous error messages when no safe module is defined in a recursive module chain. The error messages should list out the full path of the values that cause the module to be unsafe.
Format: place hint white spaces after the break hint
When formatting with margin > 9,
"@[aaaa@ bbbb@;<∞ 0>cccc@]"
the `a` and `b` blocks fits inside the margin, and thus this text ought to be formatted as
aaaa bbbb cccc
However, before this commit `Format` rendered this text as
aaaa bbbb cccc
because it attributed the size of the horizontal contents of the `@;<∞ 0>` break hint to the pending break hint `@ `. This commit fixes this issue by attributing the size of the horizontal contents of a break hint to the break hints itself rather than any pending break hint.
Reimplement generational stack scanning a la OCaml 4
Uses spare bits in return addresses to mark already-scanned stack frames. Currently works on - POWER - RISC-V - ARM 64-bits in Top Bits Ignore mode (i.e. under Linux but not under macOS)
ARM64: explicitly ignore top bits in return addresses
Unless the hardware is in top-bits-ignore mode already.
As a consequence, generational stack scanning is supported on all ARM64 platforms, incl. Apple Silicon / macOS.
The overhead of the extra masking instruction before every `ret` instruction is low: 1% to 1.5% code size increase; run-time increase is lost in the noise.
Use BUILD_PATH_PREFIX_MAP to sanitize debug event paths, do not rewrite shebang
1. bytecomp/emitcode.ml
Sanitize the paths in debug events using BUILD_PATH_PREFIX_MAP. However if the mapping has no effect, then do nothing.
2. bytecomp/bytelink.ml Do not do BUILD_PATH_PREFIX_MAP mapping of the path supplied by the user with the `-use-runtime` option. This is used to fill in the shebang part of the executable, and an abstract path is unlikely to work there.
Accept native freestanding targets at configure time
Accept `*-none` and `*-elf*` triplets for all the architectures with a native backend to describe the corresponding freestanding targets; `none` and `elf*` are the most commonly-used last components in triplets for freestanding targets Set `system` to `none` and `os_type` to `None` in such cases
Allow `*-ocaml` as target triplets to build freestanding cross compilers
Allow `ocaml` to be used as the last component of the target triplet in case we are using a custom toolchain for a freestanding target. The target triplet is then temporarily rewritten to "<arch>-none" to compute the canonical target. This allows to use a `*-*-ocaml-` prefixes (`x86_64-solo5-ocaml-`, for instance) to create cross-compiler toolchains dedicated to specific freestanding targets.
Add missing `item-attribute` rule for `let-binding` in documentation of attributes (#14077)
This reflects the grammar in https://github.com/ocaml/ocaml/blob/8761443617f229d5fe683ed2570aa79c8d64348a/parsing/parser.mly#L2742-L2759 and without this rule, the documentation doesn't account for forms like
Previously, Compmisc.init_path initialised the load path using Config.standard_library, but this can now be altered via an optional ?standard_library argument. This is used internally when testing compiler installations in order to allow Ccomp.call_linker to be used.
Fix flakiness of TSan tests using flushes and synchronization
Co-authored-by: Fabrice Buoro <fabbing@free.fr>
All logging output is moved to stderr, the same output where TSan dumps its race reports. This is to help understanding what happens if the output of this test ever changes.
Additionally, a second synchronizing barrier is added to some tests to remove flakiness.
The existing barrier ensured that
1. there was a data race, by delaying the synchronizing `Domain.join` until after both domains had accessed the shared mutable field; and 2. that these accesses always happened in the same order (write first or read first).
The role of the new barrier is to enforce always the same order between the TSan report and logging lines such as `"Leaving f"`. Not enforcing that order was the source of flakiness in these tests.
Make the libunix implementation common for Windows and POSIX.
The Windows implementation was using a simple call to MoveFileEx, but rename_os aliases to caml_win32_rename, which is a more portable and POSIX-like reimplementation, with fixes from #12320 and before.
Symmetrize caml_sys_system_command and caml_unix_system
- caml_unix_system on Windows would raise ENOENT if the command string wasn't C safe. Prefer raising EINVAL as caml_sys_system_command. - caml_sys_system_command did not call _flushall on Windows as caml_unix_system did.
This commit adds an explicit test for checking that the type scheme of a value printers does not contain any non-generic type variables before being used to print a value.
> `time` is a wrapper for `_time64` and `time_t` is, by default, > equivalent to `__time64_t`. If you need to force the compiler to > interpret `time_t` as the old 32-bit `time_t`, you can define > `_USE_32BIT_TIME_T`.
stdlib/headernt.c was adapted in OCaml 3.00 to reduce its size by avoiding the use of the CRT and using Windows API functions directly (this is a well-studied trick on Windows, principally as a puzzle for producing tiny binaries).
This got "regressed" slightly in OCaml 4.06, in the complex introduction of wide character support for Windows, as the mingw-w64 incantation required was unclear, so the entry point was changed to wmain, and the size of the header increased.
By switching from wcslen (a CRT function) to lstrlen (a Win32 API function), headernt.c again only requires kernel32.dll.
Additional flags are added for both ld (mingw-w64) and link (MSVC) to squeeze every last byte out of tmpheader.exe. The MSVC version of the header is once again no longer passed through strip, as this was found to be corrupting the executable (and had never been reducing its size anyway).
Already updated to remove the actual test in s.h, since XPG1 (1985) required it and it is therefore part of the Single Unix Specification (1992), but the _WIN32 guard and the loading of s.h are unnecessary.
Add the approx flag to merging of module constraints
- While module types constraints were using the [approx] flag to disable equivalence checking when merging, module constraints were only checked for cyclicity. Now, both use the same logic: the constraints is approximated and then merged in approx mode, where no equivalence check is done. It computes a better skeleton for the approximated signature, as the destructive substitutions are correctly removing the fields.
- Add tests borrowed (and adapted) from https://github.com/oxcaml/oxcaml/pull/4121
- This commit changes the [lookup_module_path ~load:false] for [lookup_module], where the load flag is not set to false. I should not have much impact for well-typed programs, as the loading would happen after the approximation phase anyway.
Add a [merge_type_approx] for approximation of type constraints
This commit introduces a new function to specifically deal with merging type constraints in approximation mode: - destructive constraints actually remove the constrained field, to prevent incorrect approximation (specifically, incorrect shadowing) - non-destructive constraints are treated as an identity patch, where the constrained field is replaced by itself. This allows us to reuse the normal merging infrastructure and fail early in case of ill-formed constraints where the field is not present (other forms of ill-formedness are caught later)
The [post_process] function is made aware of the approximation flag to disable wellformedness checks
The [post_process] function for signature merging was taking two linked arguments: a [~destructive] flag and a [replace] function to apply only if the flag was set to [true]. This commit combines the two into a single optional function [replace] and clarifies some documentation comments.
typechecker: fix an internal error due to wrong exception
`Ctype.mcomp` was raising an errortrace when two types were incompatible in two rare cases, whereas the function was specified to raise the `Incompatible` exception. This was fine for internal uses of `mcomp` within `Ctype`, because call to `mcomp` went through `mcomp_for` that transformed the `Incompatible` exception into an errortrace. However, this leads to internal errors for others use of `Ctype.mcomp` that were only expecting an `Incompatible` exception and not an errortrace.
This simple fix replaces the two raises of errortrace by a raise of `Incompatible`.
In https://github.com/ocaml/ocaml/pull/13580#issuecomment-3092253963 jmid reports that he needed to tweak the GC verbosity setting to avoid getting spammed by minor-gc messages when debugging an assertion failure.
The other sub-phases of the GC minor all uses `caml_gc_log` rather than CAML_GC_MESSAGE, and do not seem to cause similar spamming issues. Fixing the code to be consistent will avoid inconsistent verbosity levels in end-user scripts.
runtime: free the minor heap when leaving STW participants
The reserved address space for the minor heap area is a global resource shared by all domains; each domain owns a portion of it, within which it commits a part for its minor heap. (Having contiguous address space allows for an efficient Is_young check.). When we need more reserved space because the user increased the minor heap size, we use a STW section to change the reservation: each domain in the STW section first decommits its minor heap, a single domain changes the reserved area, and then each domain re-commits its minor heap.
If a domain does not participate to STW sections, the boundaries of its minor heap will change without the domain decommitting the previous minor heap first. If the same domain structure is used for a newly spawned domain later on, it will start by decommitting its minor heap following the new boundaries, which is incorrect as it never committed this adress range in the first place.
(In practice calling `caml_mem_decommit` incorrectly in this way does not appear to crash the program. I think this is because `decommit` has a fairly liberal behavior, it will happily do nothing if the memory range is not committed. The code remains logically wrong, and could become a hard failure if other parts of the runtime change in reasonable ways later on.)
The present commit ensures that we systematically decommit the minor heap of each domain when it leaves the set of STW participants. This way, only STW-active domains have their minor heap allocated, and changing the minor heap address space within STW section works as intended.
(I tried to remove the new call to `free_minor_heap` in `domain_terminate`, and checked that the testsuite fails in debug mode when the `allocate_minor_heap` call in `domain_create` later on notices an already-committed minor heap.)
This fixes a bug in the interaction between polymorphic variants and polymorphic parameters. The actual bug fix is just changing two falses to trues but I also changed the instance_poly interface to make similar mistakes less likely to happen.
The OPTIONAL_BYTECODE_TOOLS, OPTIONAL_NATIVE_TOOLS and OPTIONAL_LIBRARIES should be being used to affect build and installation, not definitions. If ocamltest et al were disabled, then the definitions of these programs were omitted, which prevents the reproducible generation of dependency information.
Running config.status works correctly, but individually requesting links in otherlibs/dynlink did not because the names were specified using a shell variable (i.e. at configure-time) instead of a m4sh variable (i.e. at autoconf-time).
The current codebase use 'caml_minor_heaps_{start,end}' for the boundaries of a global address space that is reserved, 'dom->caml_minor_heap_area_{start,end}' for a 'minor heap area', a segment of this address space that is owned by each domain, and then finally 'dom->young_{start,end}' for the prefix of this segment that is actually committed and used as the minor heap of each domain. Some comments refer to the latter as the 'minor heap arena', following terminology from the Retrofitting Parallelism into OCaml paper.
On a suggestion by KC, I am trying to make the naming scheme more regular by consistently using 'reservation' for a reserved block of address space:
- Use 'minor heaps reservation' for the global reservation. Its boundaries remain stored in 'caml_minor_heaps_{start,end}' to avoid compatibility issues in third-party code.
- Use 'minor heap reservation' for the per-domain segment of the global reservation. Its boundaries are stored in 'dom->minor_heap_reservation_{start,end}'.
- Use 'minor heap' for the prefix of the minor heap reservation that is actually committed, whose boundaries remain 'dom->young_{start,end}'.
My PR #14158 merged today introduced a bug in the logic to resize the minor heaps reservation. It added the following to the `free_minor_heap_arena` function:
domain_state->minor_heap_wsz = 0;
Doing this is correct when we are freeing the minor heap arena of a domain that is leaving the STW participant set (the focus of #14158); it is also correct in
int caml_reallocate_minor_heap_arena(asize_t wsize) { free_minor_heap_arena(); return allocate_minor_heap_arena(wsize); }
which is called to change the size of the memory area, so zeroing it in `free` before setting it in `allocate` is fine. However, it is *not* correct in
if (allocate_minor_heap_arena(Caml_state->minor_heap_wsz) < 0) { caml_fatal_error("Fatal error: No memory for minor heap arena"); } }
This function changes the global minor heaps reservation during a STW event where each domain first deallocates its arena and then reallocates it in the new reservation. The problem is that `free_minor_heap_arena` now changes the value of `Caml_state->minor_heap_wsz` to 0, so the re-allocation that follows will try to allocate a 0-word (in fact a 512-word due to the page-alignment normalization logic) arena.
This bug can only be encountered by calling `caml_update_minor_heap_max`, so it affects few programs.
I see two approaches to fix it:
1. we could remove the zeroing of `minor_heap_wsz`, and instead use the previous check `young_start == NULL && young_end == NULL` to detect uninitialized arenas
2. ... or we do assume that `free_minor_heap_arena` will unset the arena size (which is reasonable), and we preserve the desired size value within the `stw_resize_minor_heaps_reservation` function.
The present commit implements approach (2). I prefer to avoid a situation (as with (1)) where the `free` would leave the state only partially initialized, and it would be important for correctness.
Memprof.start replaces any existing profile in the domain, Memprof.is_sampling
The change to Memprof.start increases its compositionality while conforming to the previous behaviour (it simply fails in fewer situations). This is necessary for us to implement the Memprof interface on top of the package memprof-limits.
The new function is_sampling is for clients that do want to fail early, e.g. when detecting that two Memprof clients are interfering.
runtime: host aligned fibers inside the fiber cache whenever possible (#14169)
* When growing a fiber, zero the alignment word before computing the next size in order to make this new size fits inside the fiber cache. * Add an assertion to check that small fibers are using the cache.
Add runtime counters EV_C_MINOR_PROMOTED_WORDS and EV_C_MINOR_ALLOCATED_WORDS.
EV_C_MINOR_PROMOTED_WORDS reports words promoted by minor GC and EV_C_MINOR_ALLOCATED_WORDS reports words allocated by minor GC. Both have equivalent bytes counters.
Update the documentation for EV_C_MINOR_PROMOTED and EV_C_MINOR_ALLOCATED to qualify scope of the values reported.
Higher-level error messages for functors recompute inclusion checks when trying to discover more macro-level error messages. For this reconstruction to be accurate, those computations must use the same environment than the one used when detecting the original problem.
In particular, this environment must include equalities added during the pairing of types and modules during the signature inclusion test. For instance, in
module M: sig type t module F(X:sig val f:t val g:int end): sig end end = struct type t module F(X:sig val f:t val g:float end)= struct end end
we must remember that the interface-side `t` is equal to the implementation-side `t`.
This part of the inclusion checking environment was ignored before this commit leading to non-sensical error messages complaining that `t` is not compatible with `t`.
This commit extends the captured environment for errors in signature to include the substitution recording the equalities between items on both side of the check.
In mingw-w64 13.0.0, time.h now causes pthread_compat.h to be included which as a side-effect sets up the macros for declspec(dllimport). Since caml/osdeps.h uses time.h, this means that the macro changes designed to ensure the API functions are properly decorated doesn't get applied, and and RELOC_REL32 errors abound.
The fix for now is to ensure that the macros are setup to control pthread_compat.h at the very beginning of the file.