P, Q ::=      let x = E in P          compute expression
            |   let f (x) = E in P      local rec func definition
            |   P | Q                   parallel composition
            |   0                       inert process

  E    ::=      P
            |   run P                   spawn proccess


"join calculus" and JoCaml

by Sergei Winitzki

#def fib (n) =
       if n <= 1
       then reply 1 to fib
       else reply fib (n - 1) + fib (n - 2) to fib
val fib : int -> int = <fun>
#print_int (fib 10)
- : unit = ()

JoCaml is an extension of OCaml that supports a novel model of concurrency called "join calculus"



  • 1.1 from sequential to concurrent programming
  • 1.2 the "abstract chemical machine"
  • 1.3 using the chemical machine for computations
  • 1.4 example
  • 1.5 summary
  • 2.1 synchronous ("instant") molecules
  • 2.2 defining several "reactions" together
  • 2.3 using molecules in OCaml expressions
  • 2.4 "molecule" definitions in local scopes
  • 2.5 interactions between different chemical machines
  • 2.6 how to understand the JoCaml manual and other literature
  • 3.1 Some limitations of JoCaml
  • 3.2 Background jobs
  • 3.3 Waiting forever
  • 3.4 Parallel processing of lists
  • 3.5 Waiting for completion of many jobs
  • 4.1 Other limitations of join calculus
  • 4.2 Evaluate many functions in parallel and store results
  • 4.3 Sum of the elements in an array
  • 4.4 Parallel merge-sort algorithm
  • 4.5 Mixing global and local molecules?
  • 4.6 What you learned so far
  • 5.1 Formulation of the problem
  • 5.2 First solution
  • 5.3 Second solution
  • 5.4 Discussion
  • 6.1 Static and dynamic molecules
  • 6.2 "reaction" constructors
  • 6.3 More general constructors?
  • 6.4 Running, monitoring, and canceling a job

    from sequential to concurrent programming

    ordinary, non-concurrent (sequential) programming is viewed in OCaml as evaluation of expressions, possibly with side-effects. expressions always have a result value. roughly speaking, an OCaml program is a single (but large) expression that needs to be evaluated

    the usefulness of the program is, in most cases, not in delivering this single result value, but in doing something else (producing "side effects") while evaluating the large expression. nevertheless, computation proceeds by evaluating an expression

    given this view of computation, what should a "concurrent" program do?

    it should evaluate several expressions in parallel, - that is, concurrently and, perhaps, in an arbitrary order. the task now is to initiate and to organize this kind of computation

    the "Abstract Chemical Machine"

    imagine that you have a large tank of water where many different chemical substances are dissolved. many different chemical "reactions" then become possible in this "soup". you may denote the participants of these "reactions" by a, b, c, ... - let call them "molecules"

    you imagine that there can be many copies of each "molecule" in the "chemical soup". for example, the "soup" can contain five hundred copies of a and three hundred copies of b, etc

    this "chemical machine" is abstract, - in other words, you will be able to define arbitrary "laws of chemistry". specifically, you can postulate the existence of any kind of molecules a, b, c, d, ..., and write down all kinds of reactions, for example a + b -> c + d

    the JoCaml's conventions:

    here are some simple reactions:

    def a () & b () & c () = d () ;;

    def c () = a () & b () & d () ;;

    reaction 1 means that a, b, c instantaneously disappear and then d appears. "reaction" 2 means that c disappears and then a, b, d appear

    any given "reaction" can happen only when all of its input molecules are present in the soup

    for example, "reaction" 1 can start only when at least one instance of each of the molecules a(), b(), c() are present in the soup. "reaction" 2 can start whenever at least one instance of the c() "molecule" is present

    the output "molecules" are injected all together in the "soup", at once

    we cannot predict exactly when particular "reactions" will start. all you can say is that "reactions" will start at a random or unpredictable times after injecting the "molecules" into the "soup"

    suppose that initially the "soup" contains five instances of the "molecule" c() but no other molecules. "reaction" 1 cannot start since it requires molecules a() and b(), but you have only c(). "reaction" 2 could start with any of the five copies of c() or even with all of these c's - in parallel. however, it is not guaranteed that five parallel processes with "reaction" 2 will start immediately. so it is quite possible that one instance of c() is converted to a() & b() & d() by "reaction" 2, while some other instances of c() remain in the "soup" because "reaction" 2 has not yet started for them. in that case, "reaction" 1 could start

    what you observe is that first, "reaction" 2 proceeds, and then, at some unknown time after a() & b() were injected into the "soup" by "reaction" 2, the machine starts "reaction" 1

    in order to use the "abstract chemical machine", you need to provide a complete description of the "chemical laws". for that, you need to write down two things:

    the machine takes the initially injected molecules and runs for a while. some "reactions" occur, perhaps producing new "molecules", which then initiate new reactions, and so on. depending on the specific "chemical laws", the machine could run "reactions" forever, or stop. some "reactions" can become impossible if the required "molecules" disappear from the "soup" and never appear again. you need to keep in mind that the order and the timing of the "reactions" is unpredictable

    consider a "chemical machine" with the following four reactions. "reactions" involving the same input molecules must be defined together, and joined using the keyword or

      def a () & b () = c ()
        or b () = a ()
        or c () = d ()
        or d () = c () ;; 

    suppose that initially the "soup" contains a() & b(). if the "reaction" a() & b() = c() occurs first, you will just have c() in the soup. once you have c(), there will be infinitely many transitions between c() and d(), so the chemical machine will never stop

    on the other hand, it may be that the "reaction" b() = a() occurs first. then you will have two a's in the soup, and no further "reactions" will ever occur unless some new molecules are injected

    this is an example of a program that permits the machine to either stop or go on forever, depending on the (unpredictable) choice of the initial reaction. most probably, such a program is not what you would want

    using the chemical machine for computations

    rather than just watch "reactions" happen, you would like to use this machine for running some computations. the basic idea is this: you are going to give the machine some extra expressions to be computed whenever "reactions" occur

    the JoCaml system implements the following features in the "chemical machine":

    that's all, only two constructions - def and spawn! nothing more to learn about the join calculus and JoCaml:)

    to summarize:

    the "chemical machine" requires two parts for its description - the list of "chemical reactions" and the list of "initial molecules" injected into the soup

    the JoCaml language is an extension of OCaml that adds only two new keywords:

    it is not necessary (and in fact not possible) to define the names of the "molecules" separately from the reactions: the JoCaml system automatically defines "molecules" for any symbols that are listed on the left-hand side of a def. the symbol & is used to separate "molecules" in reactions, and also for injecting several "molecules" within a single spawn construction


    let us run "reactions" in the JoCaml system. here is the output (one needs a double semicolon to force evaluation in the JoCaml interpreter)

    $ jocaml
            JoCaml version 3.12.1
    # def c(z) = Printf.printf "reaction 4 consumes c(%d)\n" z ; 0 ;;
    val c : int Join.chan = < ;abstr>
    def a(x) & b(y) = let z = x + y in ( Printf.printf "reaction 1 yields c(%d)\n" z ; c(z) ) ;;
    val a : int Join.chan = < ;abstr>
    val b : int Join.chan = < ;abstr>
    # spawn a(2) & b(3) & c(4) ;;
    - : unit = ()
    # (* press Enter to flush I/O buffers *)
    reaction 4 consumes c(4)
    reaction 1 yields c(5)
    reaction 4 consumes c(5)

    let us look at the types of the expressions inferred by JoCaml

    after defining the "reactions" for a (x), b (x), c (x), you see that JoCaml has defined values a, b, c of type int Join.chan. this type is an instance of the polymorphic type 'a Join.chan, which is the type for molecule constructors. a "molecule" constructor is a JoCaml value of type 'a Join.chan and can be understood as a special kind of function that takes a value of type 'a and produces a molecule decorated with that value. in the above example, c is a int Join.chan

    the type int was inferred automatically - it is forced by the expression x + y

    in the spawn statement, you applied the function c to an integer "4" and obtained a copy of the "molecule" c(4)

    now, it is important to realize that a fully constructed molecule, such as c(4), is not an OCaml value. you cannot write let x = c(4) in ... or arr.(0) <- c(4). Generally, you cannot use c(4) in any OCaml data structure or expression. If you try, you get the message that the "molecule" constructor, c, is not a function and "cannot be applied". you are allowed to write c(4) only under spawn and when defining reactions

    a fully constructed molecule, such as c(4), does not have an OCaml type

    the bare c, i.e. the molecule constructor, is an OCaml value of type int Join.chan

    the "molecule" constructors can be stored in data structures, returned as result values by functions, and generally manipulated like any other OCaml values

    once you have defined a reaction, you have also automatically defined the "molecule" constructors for all the input molecules:

    # def a(x) & b(y) = let z = x + y in a(z) ;;
    val a : int Join.chan = < ;abstr>
    val b : int Join.chan = < ;abstr>
    # let p = a ;;
    val p: int Join.chan = < ;abstr>
    # let q = a(3) ;;
    Characters 8-9:
      let q = a(3) ;;
    Error: This expression is not a function ; it cannot be applied

    note that the "molecule" constructors for the output "molecules" cannot be defined separately ; all output "molecules" need to be known at the time of defining the reaction, or they need to be defined as input "molecules" in the same def statement. here is what happens if you do not define output "molecules" but use them in a new reaction:

    # def a(x) & b(y) = c(x + y) ;;
    Characters 18-19:
      def a(x) & b(y) = c(x + y) ;;
    Error: Unbound value c
     (* let's define c first, then the "reaction" with a,b,c *)
    def c(x) = 0 ;;
    val c : 'a Join.chan = < ;abstr>
    # def a(x) & b(y) = c(x + y) ;;
    val a : int Join.chan = < ;abstr>
    val b : int Join.chan = < ;abstr>

    as you will see below, it is possible to define several "reactions" at once, so that each "reaction" is using "molecules" of other reactions

    summary so far

    what have been just described is the view of concurrent computations known as the "join calculus"

    you first define all the "reactions" and the "molecules" allowed in our "abstract chemistry". you also define the computational payloads for reactions

    then you start the machine and inject some initial "molecules" into it. the machine creates and synchronizes all the required concurrent processes for us. the processes wait until suitable "molecules" are present in the soup, then automatically evaluate the payload expressions, then perhaps go to sleep or create new processes, according to the custom rules of the "abstract chemistry"

    computed values can be passed from one process to another through as "decoration values" of the molecules. in this way, a complicated system of interacting concurrent processes can be specified as a particular "chemical machine"

    in order to implement a particular concurrent program in the "join calculus", you must somehow invent the appropriate "molecules" and design "reactions" that will create and destroy the "molecules" in such a way that the concurrent payload computations are coordinated correctly

    synchronous ("instant") molecules

    Normally, all "reactions" occur asynchronously, that is, at random times. Suppose you define this simple reaction,

    # def a (x) & b (y) = print_int (x + y) ; 0 ;; val a : int Join.chan = <abstr> val b : int Join.chan = <abstr>

    The JoCaml system said that now two "molecules" are defined. What the JoCaml output does not say is that a reaction was defined as well, i.e. the chemical machine stored the payload expression print_int (x + y) in its internal memory and prepared to evaluate this expression when the "molecules" are present

    you can now inject some a and b "molecules" into the soup:

    # spawn a (1) & b (2) ;; - : unit = ()

    The injection of these "molecules" is a non-blocking function. In other words, the spawn expression returns right away with the empty value, and our program continues to run. Perhaps, quite soon the "reaction" will start, the payload expression will be computed, and you will get "3" printed as a side effect of that. However, this computation is asynchronous - you will have to wait for an unknown duration of time until the "3" is printed. If the chemical machine is very busy running lots of other reactions, you might have to wait a long time until our payload computation even begins

    But JoCaml introduces a special kind of "molecules" that are synchronous. Perhaps a fitting description of them is "instant molecules". When you inject an "instant molecule" into the soup, the chemical machine will immediately check whether any "reaction" involving this "molecule" is possible. If yes, the machine will immediately run this "reaction" and deliver the result value to the injecting expression. If no "reaction" is possible with this molecule, the injecting call will wait until a "reaction" becomes possible (e.g., until some other required "molecule" appears in the soup). Thus, injecting an "instant molecule" into the "soup" is a blocking function call

    To summarize, instant "molecules" are different from the usual (asynchronous, or "slow") "molecules" in these respects:

    How does JoCaml define the result value of an instant molecule?

    Let us imagine a "reaction" where the "molecule" a(x) is "slow" and the "molecule" s(y) is "instant". you would like the "reaction" between a and s to produce some other slow "molecules" b and c, and at the same time you would like the "molecule" s to return the value x + y. Our wish can be written informally (i.e., not yet with the correct JoCaml syntax) as

    a(x) & s(y) = b & c & (s returns x + y)

    Suppose there is initially no a "molecule" in the soup. you want to use this "reaction" by evaluating an expression containing the "molecule" s, say the expression s(3)*7. In this expression, s(3) looks like an ordinary (i.e. synchronous) function that blocks until some other "reaction" injects, say, a(2) into the soup. Then s(3) will return the value "5", while the "molecules" b and c will be injected into the soup. The entire expression s(3)*7 evaluates to "35"

    JoCaml allows us to implement this "reaction" by using the following syntax,

    def a (x) & s (y) = b () & c () & reply x + y to s ;;

    Once you have used the construction reply ... to with the name s, the JoCaml system recognizes the "molecule" s as synchronous ("instant")

    Despite the English significance of the keywords reply ... to, the value x + y is not our "reply to s", neither is this value "sent to s". This value is actually sent to the expression that called s(3) as if s(3) was a function call. It is perhaps easier to remember the correct semantics by writing something like "finish s returning x + y"

    It is important to note that the instant "molecule" "s" will not be present in the "soup" after the "reaction" is finished. It does not make sense for "s" to remain in the soup. If a "molecule" remains in the "soup" after a reaction, it means that the "molecule" is going to be available for some later "reaction" without blocking its injecting call ; but this is the behavior of an asynchronous molecule


    def b () = print_endline "b is produced" ; 0 ;;
    def c () = print_endline "c is produced" ; 0 ;;
    def a (x) & s (y) = b () & c () & reply (x + y) to s ;;
    spawn a (56) ;;
    print_int (s(45)) ;; 

    To summarize:

  • Instant "molecules" are like functions except that they will block when their "reactions" are not available
  • If the relevant "reaction" never becomes available, an instant "molecule" will block forever
  • If several "reactions" are available for the instant molecule, one of these "reactions" will be selected randomly

    Defining several "reactions" together

    Suppose you want to define several "reactions" involving the same "molecules" ; for example, the two "reactions" both involving a(x). JoCaml requires, in this case, that you should define these two "reactions" simultaneously, using the or keyword:

    def a (x) & b (y) = c (x + y)
     or a (x) & c (y) = a (x + y) ;; 

    these two "reactions" cannot occur in parallel on the same copy of the "a" molecule, because a "reaction" first consumes its input "molecules" and then produces the output molecules

    this is the central feature of "join calculus" that replaces synchronizations, locks, semaphores, and other low-level features

    keep in mind that these two definitions actually specify not only the two "reactions" but also the fact that a, b, and c are (asynchronous) "molecules". this is because a, b, and c all occur on the left-hand side of the definitions. whatever occurs on the left-hand side of a def becomes a newly defined symbol. thus, these two lines of JoCaml will define, at once, two "reactions" and three "molecules"

    Why is it necessary to define these two "reactions" together? This is because JoCaml defines new "molecules" only when they occur on the left-hand side of a reaction. If you define the reaction

    def a(x) & b(y) = c(x + y)

    alone, JoCaml will see that a and b must be molecules, but JoCaml has no way of guessing what kind of "molecule" c must be. Maybe c is an "instant molecule", maybe a "slow molecule", or maybe an ordinary function? There is not enough information for JoCaml to compile this "reaction" definition alone. Here is what happens if you try this in JoCaml:

    # def a (x) & b (y) = c (x + y) ;;

    Error: Unbound value c

    you could try to get around this problem: first define the reaction a(x) & c (y) = a (x + y) , which will define both "a" and "c" as molecules, and then define a (x) & b (y) = c (x + y), but this will not work as you wanted. The problem is that each def construct defines new "molecules" on the left-hand side, just like each let construct defines a new variable

    # def a (x) & c (y) = a (x + y) ;;
    val a : int Join.chan = <abstr>
    val c : int Join.chan = <abstr>
    # def a (x) & b (y) = c (x + y) ;;
    val a : int Join.chan = <abstr>
    val b : int Join.chan = <abstr>

    JoCaml tells us that new "molecules" a and b were defined after the second reaction. The definition of a from the first "reaction" is shadowed by the second definition of a. This is quite similar to the behavior of the let construct: let x=1 ;; let x=2 ;; defines a new variable x each time. For this reason, you need to define the two "reactions" together, connecting them by the or construct

    you can also use the and keyword to define several "reactions" together, but only when the left-hand sides of all the "reactions" involve all different input molecules:

    def a(x) & b(y) = c (x + y)
    and c (x) = print_int x ; a (x) ;;

    In this case, it is necessary to define the two "reactions" together: the first "reaction" uses the c "molecule" as output, and you have to define it as a molecule. you could define the c "reaction" first, but it uses the a "molecule" as output. Defining a "reaction" a (x) & b (y) = c (x + y) alone will be an error since c(...) is undefined. JoCaml has no separate facility for defining a "molecule" without also defining a reaction. So these two "reactions" must be defined simultaneously. This is similar to defining mutually recursive functions

    you could define these two "reactions" using the or keyword as well. There is no difference in the resulting behavior. However, you can use the or / and keywords in order to make clear our intention of defining "reactions" on different input "molecules" or the same input molecules

    The JoCaml compiler will refuse a program that defines "reactions" on the same input "molecules" using the keyword and:

    # def a (x) & b (y) = c (x + y) and a (x) & c (y) = a (x + y) ;; Error: Variable a is bound several times in this matching

    The error message talks about "matching". Actually, the "reaction" definitions are treated as pattern-matching operations. The standard OCaml facility for pattern matching is also available for defining reactions. For instance, the decorating values could have a variant type, a list, etc., and a "reaction" can require a particular matching value. An example with the Option type:

    def a (Some x) & b (y) = b (x + y) or a (None) = 0 ;;

    In this case, you are allowed to define two different reactions, and the "reaction" will be chosen according to the decorating value of the "molecule" a

    In JoCaml, the most general form of a "reaction" definition is

    def (R1 or R2 or R3 or ...) and (R4 or R5 or ...) and ...,

    where R1, R2, etc. are various reactions. All the "reactions" connected by the and keyword must use different input molecules. It is admissible to connect all "reactions" with the or keyword, but then you ignore the compiler's ability to verify that some "reactions" must have different input molecules

    Using "molecules" in OCaml expressions

    JoCaml is a superset of OCaml and uses the existing OCaml type system. What are the OCaml types of the molecules?

    A "molecule" has the syntactic form of a function call with a single argument, like a(2). Here "2" is the "decorating value" of the molecule. Molecules without decorating values are not allowed, so you need to use the empty value (), e.g. b ()

    Now, you have two kinds of molecules: "instant" and "slow". Their roles in the OCaml type system are quite different

    "instant molecules" have the type of functions, returning a value. If s(2) is an instant "molecule" returning an integer then s(2) has the type int and so s has the type int->int

    However, the story becomes more complicated with "slow" (i.e. asynchronous) molecules. a "slow molecule" such as b (2) is not an OCaml value at all "slow molecules" cannot be used in ordinary OCaml expressions - they cannot be, say, stored in an array or passed as arguments to functions

    # def b(x) = print_int x ; 0 ;; val b : int Join.chan = <abstr> # b(2) ;; Characters 0-1: b(2) ;; ^ Error: This expression is not a function ; it cannot be applied # b ;; - : int Join.chan = <abstr>

    you see that the b itself is an ordinary OCaml value that has the type int Join.chan. But this type is not a function type, so b(2) is not a valid OCaml expression

    Thus, you can regard b as the "constructor" for slow "molecules" of the form b(x). The OCaml value b can be passed to functions, stored in an array, or even used as the "decorating value" of another molecule. for example, this code creates a "higher-order" slow "molecule" that injects two copies of a given other "molecule" into the soup:

    # def a (b, x) = b (x - 1) & b (x - 2) ;; val a : (int Join.chan * int) Join.chan = <abstr>

    Here you are using a tuple as the decorating value of the "molecule" a. The "molecule" a contains the constructor of the "molecule" b in its decorating value. Since the constructor b is available, the "reaction" can inject the "molecule" b(x) into the "soup" as its output

    In this way one can also implement a "continuation-passing style", passing "molecule" constructors to "reactions" that will inject new molecules

    While "molecule" constructors are ordinary OCaml values and can be used in any expression, a fully constructed "slow molecule" like a(2) is not an OCaml value and can be used only within a "molecule expression". This is a new, special kind of expression introduced by the JoCaml system

    A "molecule expression" describes a "slow molecule" (or, more generally, a set of "slow molecules") that are being injected into the soup, together with result values of "fast molecules". Therefore, a "molecule expression" may be used only where it is appropriate to describe newly injected molecules: either after spawn or on the right-hand side of a def. Result values of "fast molecules" can be used only within a def

    Note that spawn NNN is an ordinary OCaml expression that immediately returns the empty value (), while def is by itself not an expression but a binding construction similar to let. The right-hand side of a def is a "molecule expression" that describes at once the payload expression and the output "molecules" of a reaction

    A "molecule expression" can be:

    Reactions and "molecules" can be defined locally, lexically scoped within a "molecule expression". For this, the construction def ... in is used. for example:

    spawn (def a (x) = b (x) in a (2)) ;;

    The chemical machine will, of course, know about all defined "reactions" and molecules, including a. But the OCaml program will not see a outside this spawn expression

    Molecule definitions in local scopes

    The let ... in and def ... in constructions are similar in that they define local names. However, there are special considerations for the def ... in expr construction. In brief: this construction defines local molecules, which are then invisible outside the expression "expr", but these "molecules" and their "reactions" continue to run in the soup. (The "chemical machine" always knows about all "molecules" and reactions, local or not.)

    As an example, let us try to implement an "asynchronous counter": you will increment a value every time a "reaction" starts, but you want to make this "reaction" local. you want to define a slow molecule, inc (), which asynchronously increments the counter. However, you want to protect the counter from modification by any other means

    Our first idea is to define the counter as a reference but to hide this reference using a let definition. The first try does not work:

    # let i = ref 0 in def inc () = incr i ; 0 ;;
    Syntax error

    the problem is that let i = expr works only when "expr" is an expression, but def ... is by itself not an expression. you can try to fix this:

    # let i = ref 0 in def inc () = incr i ; 0 in () ;;
    - : unit = ()
    # spawn inc () ;;
    Error: Unbound value inc

    now you don't have a syntax error, but the result is still disappointing: the "molecule" inc is invisible outside the expression where it was defined locally. So you can try to define it globally, creating a reference inside a reaction:

    # def inc () = let i = ref 0 in incr i ; print_int !i ; 0 ;;
    val inc : unit Join.chan = <abstr>
    # spawn inc () ;;
    - : unit = ()
    #  (* Enter *)
    spawn inc () ;;

    This does not work because the reference is initialized each time the "reaction" starts ; this is similar to defining a reference locally within a function body, a classical blunder

    Another approach is to abandon the idea of using references, and instead to hold the value of the counter within a molecule. This solution is then purely functional. you will also implement a fast molecule, "getcounter ()", which returns the current value of the counter synchronously

    # def inc () & c (n) = c (n + 1)
      getcounter () & c (n) = c (n) & reply n to getcounter ;;
    val inc : unit Join.chan = <abstr>
    val getcounter : unit -> int = <fun>
    val c : int Join.chan = <abstr>
    # spawn c(0) & inc () & inc () & inc () ;;
    - : unit = ()
    # getcounter () ;;
    - : int = 3

    This works but has a drawback: the counter "molecule" c(0) must be injected by hand and remains globally visible. If the user injects two copies of c with different values, the inc and getcounter "reactions" will work unreliably, choosing between the two copies of c at random. you would like to hide this molecule, so that the user will be unable to inject more copies of this molecule

    A clean solution requires us to create the "molecule" c locally but to make the "molecules" inc and getcounter globally visible. This can be accomplished if you return the "molecule" constructors inc and getcounter as a result value of a closure that defines and injects the "molecule" c internally

    # let make_async_counter init_value = 
       def inc () & c (n) = c (n + 1) 
       getcounter () & c (n) = c (n) & reply n to getcounter
       spawn c (init_value) ;
       (inc, getcounter) ;;
    val make_async_counter : int -> unit Join.chan * (unit -> int) = <fun>
    # let (inc, getcounter) = make_async_counter 0 ;;
    val inc : unit Join.chan = <abstr>
    val getcounter : unit -> int = <fun>
    # spawn inc () & inc () & inc () ;;
    - : unit = ()
    # getcounter () ;;
    - : int = 3

    Now the "molecule" c is safely hidden. It is guaranteed that only one copy of c will ever be present in the soup: Since this "molecule" is locally defined and not visible outside the closure, the user of make_async_counter is unable to inject any more copies of c. However, the user receives the "molecule" constructors getcounter and inc, thus the user can inject these "molecules" and start their "reactions" (despite the fact that these "molecules" are locally defined, like c). Each invocation of make_async_counter will create new, fresh "molecules" inc, getcounter, and c, so the user may create as many independent counters as desired

    This example shows how you can "hide" some "molecules" and yet use their reactions. A closure can define local "reaction" with several input molecules, inject some of these "molecules" initially, and return some (but not all) "molecule" constructors to the global scope outside of the closure

    Interactions between different chemical machines

    The final feature of JoCaml is the possibility of running distributed computations. You can run many independent instances of the "chemical soup machine" on the same physical computer, or on several different computers. (One JoCaml program can run only one "soup", but you can run several compiled JoCaml executables.)

    Different JoCaml instances can connect to each other and interact in the following manner:

    Suppose you wanted to inject a "molecule" such as a(2) into a "remote soup". But what if the "remote soup" does not even have a "molecule" called a? The mechanism provided by JoCaml is that you first need to obtain a molecule constructor, a, from the remote chemical machine. This is done through the "molecule name server" mechanism. Each chemical machine has its own "molecule name server" and can register some "molecules" with it. Another program can then query the server and obtain "molecule" constructors by name

    Here are the typical use cases:

    How to understand the JoCaml manual and other literature

    my terminology in this tutorial differs from that in the official JoCaml documentation and in other tutorials about the join calculus (here and here). I talk about the "chemical machine", "molecules", "reactions", "injecting molecules into the soup", "instant molecules", and "slow molecules". the official documentation, like most papers and tutorials about the join calculus, talks about "channels", "ports", "messages", and "processes". I do not use the official terminology because, in my view, it is much easier to learn the join calculus in the "chemical" metaphor than through the "channel/message/process" metaphor

    here is a dictionary of terminology:

    official manualthis tutorialexamples
    channel name
    port name
    constructor of a "molecule" r : 'a Join.chan
    messagea fully constructed "molecule" r (2)   (* not an OCaml value *)
    sending a message on channelinjecting a "molecule" into the "soup" spawn r (2) ;;
    there is a message on channel ra "molecule" is present in the "soup" r (2) has been injected
    asynchronous channel/message"slow molecule"def r (x) & a () = b (x) & a () ;;
    process"reaction"a thread that computes a reaction's function body
    join definitiondefinition of a reaction def r(x) & s () = print_int x ; r(x + 1) ;;
    spawning a processinjecting "molecules" into the "soup" spawn r (2) & s () ;;
    spawning a processemitting an output "molecule" after reactionright-hand side of a def contains a "molecule" expression
    for the output "molecules"
    synchronous channel/message"instant molecule" def f () & b(x) = reply x to f & b (x + 1) ;;

    the official manual says:

    channels, or port names, are the main new primitive values of JoCaml

    users can create new channels with a new kind of def binding, which should not be confused with the ordinary value let binding. the right hand-side of the definition of a channel a is a process that will be spawned whenever some message is sent on a. additionally, the contents of messages received on a are bound to formal parameters

    processes are the new core syntactic class of JoCaml. the most basic process sends a message on an asynchronous channel. since only declarations and expressions are allowed at top-level, processes are turned into expressions by “spawning” them: they are introduced by the keyword spawn

    in my terminology, "channels" or "port names" are called "constructors for molecules", and "processes" are called "reactions". "sending message on a channel r" means injecting a molecule, such as r(2), into the "soup". so "sending messages on a channel" does not imply that you have a queue of messages on the channel, or that the messages are retrieved one by one. "molecules" are not queued in any particular order, - they are simply injected into the "soup" all together. "reactions" can start if all the required "molecules" are present in the "soup", or "when there are messages on the required channels"

    the "contents of messages" mentioned in the manual is what I call the "decoration value" of the "molecule". so, when the manual says that "we send a message with content 2 on channel r", I say "we inject the molecule r(2) into the soup"

    when the manual says "process is spawned", I say "reaction is started"

    a "running process" is the running computation of a "reaction's" payload function

    In my view, it is unfortunate that the keyword spawn has been chosen for the operation of injecting the molecules. evaluating a spawn expression will not necessarily START any computations or SPAWN any new processes or threads. "molecules" by themselves cannot perform calculations ; only after a "reaction" between (perhaps several) "molecules" has started, the chemical machine will perform a calculation associated with that reaction

    consider the "reaction" definition

    def a (x) & b (y) = print_int x + y ; 0

    this is not a definition of the "channel a" or a "channel b" alone. this line of JoCaml code defines at once the "molecule" constructors a and b and a "reaction" between the two molecules

    note that the "reaction" will start only when both "molecules" a and b are present in the soup. if "a message arrives on channel a", that is, if only the a(x) "molecule" is present but not b(y), the "reaction" will not start.

    from the user's perspective, the channel is not "waiting" for messages to "arrive". instead, the "reaction" itself is waiting until some a(x) and b(y) "molecules" are injected into the soup, either as products of another "reaction" or by an explicit spawn

    there is one (perhaps accidental) aspect of JoCaml that is better described by the "channel" metaphor. namely, you are not allowed to define a "reaction" with several copies of the same "molecule", such as a (x) & a (y) & a (z) = b (x + y + z). a straightforward application of the "chemical" intuition would allow this kind of reaction: in real chemistry, such "reactions" are, of course, commonplace. now, the "channel/message" metaphor implies that there can be only one channel named a. you may send three messages on this channel, but these three messages cannot be processed all at once. this analogy suggests that "reactions" with several copies of the same "molecule" ("port") are not allowed

    actually, the reason such "reactions" are not allowed is technical: the implementation of the chemical machine is simpler and more efficient if only one copy of each "molecule" is allowed to participate in a reaction. in principle, one could implement a more complicated chemical machine that permits such reactions. however, this limitation is not essential and can be easily circumvented by defining extra "molecules"


    First steps in concurrent programming

    Let us see how the "abstract chemistry" needs to be designed if you would like to perform some common tasks of concurrent programming. you will think in terms of "molecules" and "reactions" rather than in terms of "threads", "processes", "channels", or "messages". In this way you will learn how to use the join calculus for designing concurrent software

    The join calculus has only two primitives:

  • defining a chemical "reaction" between molecules, -- keyword def,
  • injecting of "molecules" into the soup, -- keyword spawn

    Computations are imposed as "payload expressions" on running reactions. Thus, you organize the concurrent computation by defining the possible "reactions" and injecting the appropriate "molecules" into the soup

    The JoCaml notation for "reactions" is actually so concise that one can design the "reactions" directly by writing executable code!

    (To keep things clear in your head, think "define reaction" instead of "def" and "inject molecules" instead of "spawn". Keep in mind that a "def" defines a "reaction" together with new input molecules. The "spawn" is not necessarily "spawning" any background threads ; a "spawn" merely adds "molecules" to the soup, which may or may not start any reactions.)

  • Some limitations of JoCaml

    While designing the "abstract chemistry", you need to keep in mind certain limitations of the JoCaml system:

  • you cannot detect the absence of a given slow molecule, say a(1), in the soup. This seems to be a genuine limitation of join calculus

    It seems that this limitation cannot be lifted by any clever combinations of slow and instant "molecules" ; perhaps this can be even proved formally, but I haven't tried learning the formal tools for that. I just tried to implement this but could not find appropriate reactions. For instance, you could try injecting a fast "molecule" that reacts with "a". If "a" is absent, the injection will block. So the absence of "a" in the soup can be translated into blocking of a function call. But it is impossible, within any programming language, to detect whether a function call has been blocked, because the function call is by definition a blocking call! All you can do is to detect whether the function call has returned within a given time, but here you would like to return instantly with the information that "a" is present or absent.

    Suppose you define a "reaction" using the "molecule" "a", say "def a () = ...". If this "reaction" does not begin at a certain time, you cannot conclude that "a" is absent in the soup at that time! you can prepare another "reaction" to act like a "timeout", so that you can detect the fact that the "def a () = ..." did not start within, say, 5 seconds. But this also does not mean that the "molecule" "a ()" was absent from the soup during these 5 seconds. It could be that "a ()" was present but got involved in some other "reactions" and was consumed by them, or that "a ()" was present but the computer's CPU was simply so busy that the "def a () = ..." "reaction" could not yet start and is still waiting in the queue

    The chemical machine could generate a special slow molecule, say, stalled (), to be injected when the chemical machine finds that currently no further "reactions" are possible. One could perhaps easily implement this extension to the chemical machine, which might be sometimes useful. But this is a crude mechanism, and you will still be unable to detect the absence of a particular slow "molecule" at a given time

    Another solution would be to introduce "inhibiting" conditions on reactions: a "reaction" can start when "molecules" "a", "b", "c" are present but no "molecule" "d" is present. However, I am not sure that this extension of the join calculus would be useful. The solution based on a "timeout" appears to be sufficient in practice

  • Chemical soups running at different JoCaml instances are completely separate and cannot be "pooled"

    What you would like to do is to connect many chemical machines together, running perhaps on different computers, and to pool their individual "soups" into one large "common soup". Our program will then be able to inject lots of "molecules" into the common pool and thus organize a massively parallel, distributed computation, without worrying about which CPU computes what reaction. However, the JoCaml system can only inject "molecules" to a specific remote soup. So, in order to organize a distributed computation, you need to split the tasks explicitly between the participating soups. The organization and supervision of distributed computations, the maintenance of connections between machines, the handling of disconnections - all this remains the responsibility of the JoCaml programmer and cannot be handled automatically

  • JoCaml uses the same single-threaded garbage collector as OCaml. Thus, JoCaml cannot automatically use all cores on a multicore machine. One can run several instances of a JoCaml program on the same computer, and the OS will perhaps assign these processes to different cores. But then the programmer must explicitly distribute the computation between individual JoCaml instances

    Acknowledging these limitations, and hoping that (especially) the last two problems are technical and will be solved in the future, let us proceed to examine various common tasks of concurrent programming from the "chemical" perspective

  • Background jobs

    A basic asynchronous task is to start a long background job and get notified when it is done

    A chemical model is easy to invent: you define a "reaction" with a single slow "molecule" and a payload. The "reaction" will consume the "molecule" and stop

    def a () = do_some_work () ; 0

    The "reaction" starts whenever the "molecule" "a ()" is injected into the soup. The function do_some_work can perform arbitrary side-effects, including notifying somebody about the progress and the completion of the computations

    For convenience, you can define an OCaml function that will perform an arbitrary task in the background in this way:

    # let do_in_background work = def a () = work () ; 0 in spawn a () ;;

    val do_in_background : (unit -> unit) -> unit = <fun>

    # do_in_background (fun () -> print_string "all done\n") ;;

    - : unit = ()

    all done

    you can see that the work was indeed done in the background because the return value, " ()", was obtained before the message was printed

    Waiting forever

    Suppose you want to implement a function wait_forever () that blocks indefinitely, never returning. The chemical model: an instant "molecule" reacts with another, slow "molecule" ; but the slow "molecule" never appears in the soup

    def godot () & wait_for_godot () = reply () to wait_for_godot ;

    you also need to make sure that the "molecule" godot () is never injected into the soup. So you declare godot locally within the wait_forever function, and you will inject nothing into the soup ("spawn 0")

    let wait_forever () =

    def godot () & wait_for_godot () = reply () to wait_for_godot in

    spawn 0 ; wait_for_godot () ;;

    Parallel processing of lists

    you would like to call a function f : 'a -> unit on each element of a list s : 'a list. All function calls should be run concurrently in arbitrary order

    The chemical model: for each element x of the list, you use a separate reaction. All these "reactions" have the same payload: the application of f to x. Thus, you need a single "molecule" that carries x as the decoration value. For more flexibility, you can include the function f in the decoration value. The "reaction" is defined like this:

    def a(f,x) = f x ; 0

    To start the reactions, you need to inject the "a" "molecules" into the soup, and you need to inject as many instances of the "molecule" as elements in the list. So you would need to write spawn a(f,x1) & a(f,x2) & ... & a(f,xN). How can you perform this operation, given a list of values and a function? you could evaluate each spawn separately, using the standard list iterator:

    List.iter (fun x -> spawn a(f,x)) [0,1,2] ;;

    This will inject three copies of the slow "molecule" "a" into the soup. Three "reactions" will eventually start (in unknown order)

    Another possibility is to use the special loop syntax for "spawn":

    spawn for i = 0 to 2 do a(f, i) done ;;

    Waiting for completion of many jobs

    Suppose you want to start a number of background jobs, all of the same kind, and you need to wait (synchronously) until all of those jobs are finished. It is clear that each job is running as a payload on some reaction. Thus, you need to wait until all "reactions" of a given kind are finished. Let us denote by "job ()" the "molecule" that starts each of these reactions

    The only way to wait for something is by arranging a "reaction" that does not start until a certain "molecule" is present. Thus, you need a "molecule" that is absent from the soup until all our jobs are finished. Let us call this "molecule" "all_done ()". you could define a "reaction" that notifies somebody of the completion of all jobs:

    def all_done () = print_string "all done\n" ; 0

    Now, who will inject "all_done ()" into the soup?

    When one job is finished, the job "reaction" cannot know whether other jobs are still running. Also, the chemical machine cannot perform a direct test for the absence of a molecule, or a direct test for the presence of a certain number of identical molecules. Thus, you cannot have a "reaction" that generates "all_done ()" when all "job" "molecules" are consumed

    The only solution is to count the finished jobs by hand, knowing in advance how many jobs were started. So you need a "reaction" that knows how many jobs were started and generates "all_done ()" when no jobs remain unfinished, but not before. Since you cannot have "reactions" that involve several instances of the same molecule, you need to hold the number of unfinished jobs as a decoration value on some molecule. Let us call this "molecule" "remains(n)". Each job can consume "remains(n)" when done and inject a new remains(n-1) molecule. If nothing remains, you can inject all_done () into the soup.

    Our first try for the "job" "reaction" is this:

    def job(do_work) & remains(n) = do_work () ; if n>1 then remains(n-1) else all_done ()

    Initially you inject one instance of remains(n) and n instances of the job(...) "molecule" into the soup. Each job(...) "molecule" could carry its own workload function. When all_done () appears, you are sure that all jobs have finished. Let us test this: each job will simply print a digit from 0 to 9

    # spawn remains(4) ;; (* you consider things done when 4 jobs are finished *)

    - : unit = ()

    # spawn for i=0 to 9 do job(fun () -> print_int i) done ;; (* but you inject 10 jobs into the soup *)

    - : unit = ()


    0987all done

    spawn remains(3) ;; (* now you consider things done when 3 jobs are finished *)

    - : unit = ()

    654all done

    This solution is flawed in several ways. At the end of the calculation shown above, the soup still contains four job(...) molecules. However, there are no remains(n) molecules, so no further "reactions" are actually running. If you want to keep the jobs running even after the all_done () "molecule" was generated, you can modify the definition of the "reaction" so that the remains(n) "molecule" is always kept in the soup. Nothing prevents us from injecting several remains(n) "molecules" into the soup at the same time, with different values of "n". you could prohibit this by encapsulating the remains(n) molecules, so that the user cannot make a mistake when injecting the molecules

    There is another, more serious flaw in the present code ; can you see it?

    The flaw is that the remains(n) molecule is consumed by the job "reaction" (and injected only when a job is finished). While one job is computing its do_work () function, the remains(n) molecule is not in the soup. So only one job can run at any one time. you lost the concurrency!

    In order to restore the concurrency, you need to make sure that remains(n) molecule is always present in the soup. The updating of the value of "n" must be synchronous, but you would like this to be done while other jobs are running. Therefore, you need another "reaction" for updating of the value of "n" in remains(n). This "reaction" must be triggered asynchronously at the end of each job ; let us define a triggering "molecule" for this, called done (). The updating "reaction" could look like this:

    def remains(n) & done () = if n>1 then remains(n-1) else all_done ()

    The job "reaction" is then simplified:

    def job(do_work) = do_work () ; done ()

    The process looks like this: you inject several "job" "molecules" and one "remains" molecule. All jobs "molecules" can start at once, producing eventually a bunch of "done" molecules. These "done" "molecules" react one by one with the single "remains" molecule. All the "job" "reactions" can run in parallel, but only one "remains" "reaction" runs at any one time

    An alternative implementation is to make the "done" "molecule" an instant molecule. Then the "reactions" look like this:


    job(do_work) = do_work () ; done () ; 0


    remains(n) & done () = reply () to done & if n>1 then remains(n-1) else all_done () ;;

    Now each "job" "reaction" starts concurrently but blocks at the instant "done" call

    It seems that the chemical model is quite flexible and allows us to configure the concurrent computations in pretty much any manner

    Further examples

    I would like to consider three examples now:

    1. Given an array of functions, produce an array of their result values. The functions should be evaluated asynchronously in parallel
    2. Given an array of integers, compute their sum. All pairwise summations should be evaluated in parallel in arbitrary order, and the partial sums should be also added in parallel in arbitrary order
    3. Sort an array of integers using the merge-sort algorithm, again doing as much as possible in parallel.

    Now let us figure out the implementation of these examples in join calculus. you will be using the purely "chemical" approach to concurrent computation. you will never say the words "semaphore", "thread", "deadlock", "mutex", or "synchronize". Instead, you will talk about "molecules" and reactions. Our goal is to see what kind of tricks and patterns emerge from this paradigm

    Other limitations of join calculus

    In order to implement a concurrent computation of many functions, one might want to define a separate "molecule" for each array element. However, this is not possible in the join calculus. The join calculus has these limitations, beyond those I described before:

    Nevertheless, it seems that the join calculus can express essentially all concurrent computations. you will implement the examples now.

    Evaluate many functions in parallel and store results

    Each evaluation of a function must be a separate reaction. The payload of that "reaction" can assign the element of the array with the result value. So you only need one "molecule" for this reaction. Let this "molecule" be called "a" and carry, as decoration values, the function, the array, and the array index.

    let n = 10 ;;
    let tasks = (* array of functions, each returns 1 *)
    Array.make n (fun () -> 1) ;;
    let results = Array.make n 0 ;;
    def a(f,arr,i) = arr.(i) <- f () ; 0 ;;
    for i = 0 to n-1 do

    spawn a(tasks.(i),results,i)

    done ;;

    This works ; however, the "results" array is updated asynchronously, and the final array will be ready at an unknown time when all 100 "reactions" are finished. Certainly, you would like to be notified when this happens! As you have seen before, you need a counter that will show how many computations remain. This counter will start a new "reaction" when all computations are finished. This counter also needs a separate "reaction" for updating itself. How will you write the "reaction" for the counter?

    It is impossible in join calculus to wait until no more "a" "molecules" are present. Therefore, the counter should explicitly know how many "reactions" were started.

    To maintain the counter, you need to check how many "a" "reactions" have been completed. Let us produce an additional molecule, "ready", at the end of each "a" reaction, so that the counter could react with the "ready" molecules.

    So you modify the "reaction" for "a" like this:

    def a(f,arr,i) = arr.(i) <- f () ; ready ()
    and ready () & remains(k) = if k>1 then remains(k-1) else all_done ()
    and all_done () = print_string "all done\n" ; 0 ;;
    spawn remains(n) ;;

    The advantage of this approach is that the counting is performed asynchronously, independently of the "a" reaction. Thus, all "a" "reactions" can run in parallel, producing many "ready" molecules, while the "remains" "molecule" counts and consumes each of the "ready" "molecules" one by one.

    This code works ; here is the full test code for JoCaml. I added some printing so that you can watch the progress.

    # let test n =
    let tasks = Array.make n (fun () -> 1) in
    let results = Array.make n 0 in

    a(f,arr,i) = arr.(i) <- f () ; print_int i ; print_newline () ; ready ()

    ready () & remains(k) = if k>1 then remains(k-1) else all_done ()

    all_done () = print_string "all done\n" ; 0


    spawn remains(n) ;
    for i = 0 to n-1 do

    spawn a(tasks.(i),results,i)


    test 10 ;;
    all done
    - : unit = ()

    As you can see, the tasks were processed concurrently in somewhat arbitrary order.

    Sum of the elements in an array

    Suppose you have an array, say, {10,20,30,40,50}. you need to add some pairs, then add partial sums, etc., until all is added together. A first idea would be to inject "molecules" a(10), a(20), a(30), a(40), a(50) and organize "reactions" such that all numbers are eventually added together. It remains to define the necessary reactions.

    A "reaction" such as a(x) & a(y) = a(x + y) is not allowed because input "molecules" cannot be duplicated. How can you put two "a" "molecules" together in a reaction? Obviously, we need new input "molecules" here. If you define the "reaction" "a(x) & b(y) = b(x + y)", then you merely convert a's to b's, and the "reactions" will stop. you will not have succeeded in bringing a pair of "a" "molecules" together. So you need two further "molecules" and two reactions, say, like this:

    def a(x) & b () = a'(x) or a(x) & a'(y) = a(x + y) ;;

    Note that you use the "or" keyword ; this is required in JoCaml for "reactions" that use the same input molecules

    These two "reactions" are equivalent to the (disallowed) reaction a(x) & a(y) = a(x + y), as long as you always inject exactly as many "b" "molecules" as "a" molecules. (Then the number of "b" "molecules" will always remain equal to the number of "a" molecules.)

    So you find that the restriction to non-repeating "molecules" is not really a limitation of the expressive power of the join calculus. you can always define additional "molecules" and "reactions" to achieve the same effect as in a "reaction" with repeating molecules.

    Having defined these reactions, you will have achieved that eventually all "a(x)" "molecules" will come together and produce a single "a(z)" molecule, whose value z will be equal to the sum of all previously given values. However, you again face the same problem as in the previous example: namely, you will not know when this happens. There is no way to check that only one molecule "a(x)" is present in the soup.

    It seems that you again need some kind of "counter" that keeps track of how much remains to be done. What kind of "reaction" will this counter have?

    The only place that you perform the summation of a pair is the reaction a(x) & a'(y) = a(x + y). Therefore, you need to count how many such "reactions" have been finished, so that the last "a(x + y)" "molecule" knows that it is the last one.

    If you use the same "counter" technique as in the previous example, you will force the summation to proceed serially: the "counter" consumes each "ready" "molecule" one by one. So here you need a different tecnique.

    It seems that you need to put the progress information into the "a" molecules. For instance, the "a" "molecule" can carry the number of sums already performed as a second value. Let "a(x,k)" represent a partial sum of "k" elements, while the value of the partial sum is "x". Initially you will inject "a(x,1)" for each x from the given array. Let us see how to arrange the necessary reactions.

    At the end of the calculation, you will be left with a single "a(z,n)" "molecule" whose value "z" will be our resulting sum, while "n" will be equal to the total number of elements. There will be also a single "b" "molecule" (since the numbers of "a" and "b" "molecules" are always equal). Therefore, the last reaction, a & b = a', will happen at the end of the calculation. It is this "reaction" that you can use to generate the final "all_done" molecule. Let us put the value "n" on each of the "b" "molecules" from the beginning ; these values will be used to check that "a(z,n)" is indeed the last molecule. Our "reactions" therefore look like this:

    def a(x,k) & b(n) = if k=n then all_done () else a'(x,k)
    or a(x,k) & a'(x',k') = a(x+x',k+k') ;;

    In order to start the reactions, you need to inject "n" copies of "b(n)" as well as "n" "molecules" "a(x,1)".

    Here is some test code for JoCaml:

    # let test n =
    let inputs = Array.make n 0 in
    let () = for i=0 to n-1 do inputs.(i) <- 10*(i+1) done in

    all_done(z) = Printf.printf "all done, result=%d\n" z ; 0


    a(x,k) & b(n) = if k=n then all_done(x) else a'(x,k)

    a(x,k) & a'(x',k') = a(x+x',k+k')

    for i=0 to n-1 do spawn a(inputs.(i), 1) & b(n) done ;;
    val test : int -> unit =
    # test 10 ;;
    - : unit = ()
    all done, result=550

    Parallel merge-sort algorithm

    This is a more advanced example, so hold on. I assume you are familiar with the merge-sort algorithm

    Suppose our array is [4 ;2 ;6 ;1 ;5 ;3]. you would like to begin by making sorted pairs, making three arrays [2 ;4], [1 ;6], [3 ;5]. Then you merge the partial arrays: [1 ;2 ;4 ;6], [3 ;5]. Finally, the two last arrays need to be merged.

    In the previous examples, you performed concurrent computations in unknown order, and that was perfectly admissible. The situation is different now: you do not want to merge arrays in an arbitrary order because this would increase the asymptotic complexity of the algorithm. Since the order of "reactions" is essentially random, you would be often merging arrays of unequal size, which destroys the efficiency of the merge-sort algorithm. you need to organize the computation so that, e.g., an array of four is merged only with another array of four, unless no other such arrays exist or could be created. Occasionally you will need to merge arrays of unequal size, but you want to avoid this whenever possible.

    Let us try to think how to organize the computation. Merging of two arrays definitely needs to be a reaction, and the resulting sorted array needs to be carried by a molecule, say "sorted[1 ;2 ;4 ;6]". However, if each "molecule" carries one array, and if you inject "molecules" for each partial array, say "sorted[2 ;4]", "sorted[1 ;6]", "sorted[3 ;5]", you cannot prevent "reactions" from happening between any two sorted arrays of any size. you cannot use this method.

    What you need is to organize the "reactions" hierarchically. The merging of two arrays must be performed only when the two arrays are at the same "level" in the hierarchy.

    Our main problem here is that the chemical machine does not do any computations in order to decide which "reactions" to start. It starts "reactions" whenever "molecules" are available. It is impossible to specify that the "molecule" "a(x)" starts a "reaction" only if its value "x" satisfies some condition.

    you could explicitly store the information about which parts have been merged -- say, a list or a tree of some kind, stored on a special "dispatcher" molecule. This means you would write a complicated "dispatcher" code that will start all other reactions. This will perhaps work, but there is a more elegant solution.

    you could define separate reactions for merging of partial arrays of different size. Let us investigate this possibility.

    What you would ideally want is that the smallest arrays are merged pairwise in parallel ; then the next-level arrays are merged ; and so on. So each merging needs a different molecule. Suppose that the smallest arrays are carried by the "molecules" "a". Whenever two "a"'s merge, say "a[4]" with "a[2]", you would get a "molecule" "b[2 ;4]" with a sorted partial array. Whenever two "b"s merge, say "b[2 ;4]" and "b[1 ;6]", you get a "c[1 ;2 ;4 ;6]", and so on. So you would need to define as many different "molecules" as you have levels in the binary tree that represents the sorting scheme. If the initial array has length 1024, you would need at least ten different "molecules" ("a", "b", "c", ..., "j"). The problem is that the size of the array is unknown, so the required set of "molecules" would have to be computed at run time. However, the join calculus does not allow us to define a computed set of new molecules. All the "molecules" and their "reactions" need to be statically defined!

    It turns out you can overcome this limitation elegantly (i.e. without writing a lot of complicated logic) by using the basic tools of functional programming: locally defined molecules and recursion. The basic scheme of "merge-sort" is that you split the initial array in two roughly equal parts, sort each part recursively, and merge the two parts. The splitting is performed in a certain molecule, and this "molecule" needs to be recursively defined, since it delegates sorting to itself. The result of sorting each part is a sorted list in a "molecule" that you call "sorted". However, the "sorted" "molecules" need to only merge with "sorted" "molecules" at the same level of the hierarchy. you need to prevent the "sorted" "molecules" from merging with any other sorted results.

    The way to achieve this is to define the "sorted" "molecules" locally within the splitting molecule. Then the "sorted" "molecules" will be lexically confined to the body of the splitting "molecule" ; no other "molecules" will be able to enter any "reactions" with these locally defined "sorted" molecules. This will be the basic idea of our solution.

    What you would like to do, therefore, is to implement the splitting as a recursive definition of the "molecule" "mergesort". This "molecule" carries the initial, unsorted array, e.g. "mergesort[5 ;1 ;3 ;4 ;2]", and immediately starts a reaction, "mergesort(arr) = ...". When this "reaction" is finished, it should inject the "molecule" "sorted[1 ;2 ;3 ;4 ;5]", with the resulting sorted array as the decorating value.

    The "reaction" "mergesort(arr) = ..." should work like this: If the array has length 1, it is already sorted, and you can emit the "molecule" "sorted(arr)" right away. Otherwise you split the array "arr" in two parts and call "mergesort" on each part. These will eventually produce two "sorted" molecules, which you will then merge.

    So our initial idea of the code (not yet working) is this:

    def mergesort(arr) =
    if ( arr has length 1) then sorted(arr)
    def sorted(x) & sorted(y) = sorted(array_merge x y)
    (* this is not legal in join calculus, but we'll take care of that later *)
    let (arr1, arr2) = array_split arr in
    mergesort(arr1) & mergesort (arr2) (* recursively sort two halves *)
    (* results will be sorted(arr1') and sorted(arr2'),

    which will be merged recursively


    This code will not work, for two reasons. First, you are not allowed to have a "reaction" with two identical molecules. This problem is minor, and you already know how to fix it: Any "reaction" of the form

    a(x) & a(y) = whatever(x,y)

    can be always replaced by two "or"-coupled "reactions" with two new molecules, a' and b:

    def a(x) & b () = a'(x) or a(x) & a'(y) = whatever(x,y)

    The second problem is deeper: if you define the "sorted" "molecule" locally inside "mergesort", then "sorted" will be undefined outside "mergesort". This is a problem since you want to use "mergesoft" recursively. you expect that "mergesort(...)" yields a "sorted(...)" at the end, so that all the "sorted" "molecules" can be merged. If "sorted" is undefined outside "mergesort", you cannot use it to get the results of the "reactions" starting from "mergesort(arr1)" and "mergesort(arr2)". These "reactions" will produce "molecules" that are defined only locally inside them, and invisible to the scope where these "reactions" were started. Then our merging "reaction" "sorted & sorted = sorted" will never start, since it is defined for our local "sorted" molecule, not for the (two different) "sorted" "molecules" that are locally defined inside the recursive calls to "mergesort(arr1)" and "mergesort(arr2)".

    So we must define the "sorted" "molecule" outside "mergesort". However, if all "mergesort" "reactions" yield the same "sorted" molecule, the result of every "mergesort" will be free to combine arbitrarily with other results, and you lose the hierarchical organization of the merging reactions. Somehow you still need to have different "sorted" agents for every level of "mergesort"!

    The solution is to define two different "sorted" molecules, one outside and one inside the "mergesoft". The external "sorted" "molecule" (to be named differently, say "sorted_res") will be passed to the "mergesoft" "molecule" as a parameter, i.e. as a molecule constructor. The end result of the "mergesoft" "reaction" should be one external "sorted_res" molecule. The local "sorted" "molecule" defined inside "mergesort" will be used to sort the recursive results. This local "sorted" "molecule" will be also passed to the two recursively called "mergesort" molecules. In this way, these "mergesort" "molecules" will each produce a locally defined "sorted" molecule, which cannot react with any external "sorted" "molecules" (or with the internal recursively defined "sorted" molecules). The "sorted" "molecules" will be able to react only with the ones defined within the same scope. This automatically enforces the hierarchy of "reactions" that you need.

    Here is the complete test code for JoCaml:

    let array_split arr =
    let n = Array.length arr in
    if n<2 then raise (Invalid_argument "arraysplit") else
    let i = n/2 in
    (Array.sub arr 0 i, Array.sub arr i (n-i))

    let array_merge arr1 arr2 is_less =
    let arr = Array.append arr1 arr2 in
    let rec merge i1 i2 i =
    ( if i1 = Array.length arr1 && i2 = Array.length arr2
    then arr (* all done *)


    (* the first element is chosen only under this condition: *)
    let (x,new_i1,new_i2) = if i1 < Array.length arr1 && ( i2 = Array.length arr2 || is_less arr1.(i1) arr2.(i2) )
    then (arr1.(i1), i1+1, i2) else (arr2.(i2), i1, i2+1)
    (arr.(i) <- x ; merge new_i1 new_i2 (i+1))
    in merge 0 0 0


    let string_of_array arr string_of_elt separator =
    let rec arr2str i result = if i = Array.length arr then result
    else arr2str (i+1) (result ^ (if result = "" then "" else separator) ^ string_of_elt arr.(i))
    arr2str 0 ""


    def mergesort(arr, sorted_res) =
    if Array.length arr <= 1 then sorted_res(arr) else
    let (part1, part2) = array_split arr in

    sorted(x) & a () = sorted'(x) or sorted(x) & sorted'(y) = sorted_res(array_merge x y (<))

    mergesort(part1, sorted) & mergesort(part2, sorted) & a ()
    in (* mergesort is now defined ; now you set up the first call to it *)
    def print_result (arr) = Printf.printf "finished: [%s]" (string_of_array arr string_of_int " ;") ; 0
    in (* now you call mergesort with an externally provided "sorted_res" argument as "print_result" *)
    spawn mergesort([| 3 ; 2 ; 5 ; 1 ; 4 |] , print_result) ;;
    - : unit = ()
    finished: [1 ;2 ;3 ;4 ;5]

    Mixing global and local molecules?

    In the example just given, you defined the "mergesort" "reaction" recursively, and you defined new "reactions" locally in the body of the "mergesort" reaction. These local "reactions" used new (locally defined) "molecules" "sorted", "a", "b" as well as the previously defined "molecule" "sorted_res". Can you freely mix new, locally defined "molecules" with global or previously defined molecules? Experimentation shows that this does not always work:

    # def a () & b () = print_string "reaction a & b\n" ; reply true to b & a () ;;
    val a : unit Join.chan =
    val b : unit -> bool =
    # def a () & c () = print_string "reaction a & c\n" ; reply true to c & a () ;;
    val a : unit Join.chan =
    val c : unit -> bool =
    # spawn a () ;;
    - : unit = ()
    # c () ;;
    reaction a & c
    - : bool = true
    # (* now try the "reaction" a&b *)
    b () ;; (* this blocks forever! *)

    The second def statement introduces a new, locally defined "a" molecule, completely shadowing the previous "a" and its "reaction" "a ()&b ()". When you say "spawn a ()", you inject the new "a" molecule, not the old "a" with which "b" can react. The old "reaction" "a ()&b ()" is still defined but waits forever for the previous "a", which can never appear now.

    As you perhaps noticed, a def statement cannot add a new "reaction" to a previously defined input molecule! When a molecule, such as "a", is being defined, the chemical machine needs to know at once all the "reactions" in which "a" appears as an input molecule. An example: Suppose you have "molecules" "a" and "b" in the soup, and a "reaction" "a () & b () = ..." is defined. Then, it is not possible to add more "reactions" starting from "a" in a local expression, -- "local reactions", as it were. In my understanding, "additional local reactions" are not consistent with the semantics of the join calculus. Here is what would happen if you allowed such definitions, for example, an additional reaction "a () = ..." in a local expression. Suppose that the (fictitious) keyword "def_more" adds "reactions" to an already defined molecule

    def a () & c () = 0
    or b () & c () = d () ;;
    def_more a () = b () in spawn c () ;; (* not legal in JoCaml! *)
    spawn a () ;;

    you have set up the "reactions" so that "a", "b", "c", and "d" are globally defined. The "reaction" "a = b" is (supposedly) defined only locally within the expression "spawn c ()". Upon evaluating that expression, however, no "reaction" has started yet, since the expression "spawn c ()" does not block, it returns right away, injecting "c" into the soup. you also intend that the "reaction" "a=b" should be inactive outside the expression "spawn c ()". So the "reaction" "a=b" seems to be permanently invisible! It appears that the "molecule" "a" has an extra "reaction" defined in a local expression, but this "reaction" must remain undefined (and inactive) unless this local expression is "active". However, it is meaningless to say "an expression is active" ; an expression is not a process but a definition of a computable value. So this kind of code does not actually define any extra "reactions" at all. The effect must be the same as if you never defined the "reaction" "a=b"

    To summarize:

    1. Each def statement introduces new and locally defined "molecules" (listed on its left-hand side). In other words, all "molecules" appearing on the left-hand side of a newly defined "reaction" are new "molecules" and will shadow any previously defined "molecules" with the same names. (Previously defined "molecules" can be used only on the right-hand side of a def, i.e. as output molecules of a new reaction.) The chemical machine can define local molecules, but not additional "reactions" with any previously defined input molecules. For any molecule, all "reactions" where it appears as input "molecule" must be in one scope.
    2. Although a def statement introduces locally defined molecules, there is generally no such concept as a "locally defined reaction". Reactions defined for local "molecules" are just as permanently and globally defined as all other reactions. All "reactions" defined anywhere in the code are always available in the chemical machine, even though the local "molecules" may be invisible outside some expressions.

    What you learned so far

    The chemical machine (the join calculus) does not support any computations on the set of molecules, and yet you can impose any recursive (inductively defined) structure on the molecules. you can organize "molecules" in a list or in a tree, for instance. This is possible because you are allowed to define new "molecules" at run time in local scopes, as long as you define them statically.

    In the example with "merge-sort", you effectively imposed a tree hierarchy on "molecules" by using local definitions and recursion. To the chemical machine, all the "sorted" "molecules" defined by various recursive invocations of "mergesort" are different molecules, because they are defined in different local scopes. At the same time they are all statically defined. The system can thus guarantee consistency of reactions.

    for example, if you were allowed to define an "array" of 100 "molecules" and a function that injects "molecule" number n, and later try to inject a "molecule" number 101, you would get an error such as "array index out of bounds". These errors are simply not possible when all "molecules" are defined statically. The join calculus does not allow us to inject a "molecule" that somehow becomes undefined later.

    As you have seen so far, the join calculus has very few features and yet can express many different ways of organizing concurrent computations quite concisely. Is this what they call a "highly expressive" calculus?

    In the next section, you will use join calculus to solve a well-known problem in concurrent programming

    Dining philosophers

    In the previous section, you have seen that the join calculus has a limitation: the "molecules" and the "reactions" need to be specified statically (at compile time rather than at run time). Also, it is not possible to create new "reactions" involving old input molecules. you will now solve a classic exercise in concurrent programming -- known as the "dining philosophers" problem -- where you will have to work harder to overcome these limitations of the join calculus

    you will show two solutions: the first will be extremely simple but limited ; the second will be more involved but more flexible

    Formulation of the problem

    Five philosophers A, B, C, D, E sit at a round table. There is a fork between each two neighboring philosophers. Each philosopher is either thinking, or hungry, or eating. Thinking and eating take a random amount of time, always less than 1 hour. When a philosopher has finished thinking, he becomes hungry. When a philosopher is hungry and both forks next to him are free, the philosopher can start eating. When the philosopher finishes eating, he puts down both forks and starts thinking. The task is to program the behavior of each philosopher such that everyone always gets a chance to eat, for any choice of each philosopher's random duration of eating and thinking. That is, the program must not control the duration of eating and thinking, these durations must remain random! In effect, the only thing our program must determine is when each philosopher will start eating once that philosopher becomes hungry

    First solution

    Thinking and eating must be possible for each philosopher concurrently with other philosophers. Therefore, these processes must be represented by different reactions. Let us introduce "molecules" tA, tB, tC, tD, tE for thinking philosophers and hA, hB, hC, hD, hE for hungry philosophers. Let us also introduce "molecules" fAB, fBC, fCD, fDE, fEA representing forks situated between each pair of philosophers. Initially there will be the five forks and the five thinking philosophers. The "reaction" for thinking will consume a thinking philosopher, wait a random time, and produce a hungry philosopher:

    tA () = random_wait () ; hA ()

    The "reaction" for eating will consume a hungry philosopher and two forks, wait a random time, and produce a thinking philosopher and the two forks.

    hA () & fEA () & fAB () = random_wait () ; tA () & fEA () & fAB ()

    you simulate eating and thinking by a random delay:

    let random_wait () = Unix.sleep ( 3600) ;;

    you can now immediately express the requirements of the problem by defining the following reactions:

    def  hA () & fEA () & fAB () = random_wait () ; tA () & fEA () & fAB () 
    or hB () & fAB () & fBC () = random_wait () ; tB () & fAB () & fBC ()
    or hC () & fBC () & fCD () = random_wait () ; tC () & fBC () & fCD ()
    or hD () & fCD () & fDE () = random_wait () ; tD () & fCD () & fDE ()
    or hE () & fDE () & fEA () = random_wait () ; tE () & fDE () & fEA ()
    and tA () = random_wait () ; hA ()
    and tB () = random_wait () ; hB ()
    and tC () = random_wait () ; hC ()
    and tD () = random_wait () ; hD ()
    and tE () = random_wait () ; hE ()
    spawn fAB () & fBC () & fCD () & fDE () & fEA () & tA () & tB () & tC () & tD () & tE () ;;

    This is actually a complete solution of the philosopher's problem as stated! In order to verify that the system works, you can add some diagnostic printing to the eating and thinking reactions

    This solution makes sure that each philosopher starts eating only when both neighbor forks are free. Since both forks are consumed by a "reaction" at once, and the chemical machine will choose some "reaction" when forks are available, deadlock is impossible in this system!

    The brevity of JoCaml is remarkable in this example. you merely wrote down the conditions for the thinking and eating processes, and the result is already a working JoCaml program

    Problems with the first solution

    This solution is extremely concise and simple, but has two important drawbacks. The first obvious drawback is that the program defines all "reactions" and "molecules" statically (i.e. at compile time). Here you wrote 10 lines of code to define the necessary "reactions" for 5 philosophers ; so you have to use two lines of code per philosopher. If you want to simulate 10 philosophers, you will have to write 20 lines of code ; you cannot avoid writing the repetitive code. But the repetitive code is not the main problem. The main problem is that the solution is restricted to a fixed number of philosophers. In join calculus, as you have seen, it is not possible to add new "reactions" to already existing input "molecules" (or to disable existing reactions). So the first solution must set the number of philosophers at compile time ; you can neither specify nor change the number of philosophers at run time

    The second drawback is more subtle. This solution does not guarantee that every hungry philosopher receives the forks within a bounded period of time. It can happen that, say, A is eating and B is hungry. B can start eating only if neither A nor C are eating, so B has to wait while A is eating. But it can happen that C starts eating while A is still eating. Then let us suppose that A stops eating and, while C is still eating, A again gets hungry and starts eating ; only then C stops eating. you are back to the beginning, and B still has to wait. This situation may continue indefinitely, and B will stay hungry since always either A or C is eating. If the eating and thinking times are truly random and independent for all philosophers, eventually (with probability 1) the philosopher B will obtain two forks and start eating. However, there is no limit on the waiting time while hungry, for any philosopher! There is a nonzero probability that some philosopher will have to wait, say, 1000 hours before obtaining both forks

    To solve this problem, you need to prioritize the choice of who starts eating. Clearly, it should not be allowed that every philosopher starts eating as soon as forks are available. Sometimes, a hungry philosopher must let a hungry neighbor eat first. This can be organized in various ways ; for instance, if two neighbor philosophers are hungry and both have forks available, the philosopher who got hungry earlier could start eating first. (you can keep timestamps showing the most recent time of getting hungry for each philosopher, and you can implement the timestamps in such a way that no two philosophers get hungry at exactly the same time.)

    Second solution

    The second solution will be able to define the number of philosophers at run time, and will implement an algorithm for letting hungrier philosophers eat first. you now develop the JoCaml code step by step, trying to find the simplest possible solution

    Our first task is to implement a dynamically chosen number of philosophers. Let us look at the "reactions" in the first solution. The number of philosophers was statically defined because you needed two "reactions" per philosopher. All the "reactions" had to be defined together in a single def phrase, because the "reactions" use the same "molecules" as input and output. Is it possible to decouple the reactions, so that you can define "reactions" separately for each philosopher? How can you then define an arbitrary number of identical "reactions" with different molecules?

    In the join calculus, the way to define an arbitrary number of identical "reactions" is to define them locally in a closure. The first solution used "reactions" that were all coupled together: the first philosopher's "hA" "reaction" used the forks that were also used as input by the second philosopher's "hB" reaction, and so on. It is not possible to generate such a coupled "reaction" structure dynamically in join calculus, because it is not possible to define a new "reaction" coupled (through common input molecules) to an already existing reaction.

    Since you now want to decouple the reactions, you need to express the behavior of one philosopher through "molecules" that are not related to other philosophers. So, you have to abandon the first solution's elegant idea of using forks directly as molecules. Instead, as a first try, you could write something like this:

    def hungry () & forks () = eating ()
    and eating () = random_wait () ; thinking () 
    and thinking () = random_wait () ; hungry () ;;

    Here, a single molecule, "forks", is used for starting the eating process. The philosopher stays hungry until the forks are available ; otherwise, the philosopher thinks or eats for a random duration of time

    Who will inject the "forks" molecule? This must be done by a "reaction" external to all philosophers, a "waiter" who serves the forks and decides who will eat when. The "waiter" should have access to all the "forks" and know when each philosopher gets hungry, so that the "forks" "molecules" can be injected when possible and when needed

    For this chemistry to work correctly, the soup must always contain only one "molecule" representing each philosopher. This can be achieved if we define these "molecules" locally within a closure. The closure can also return the "forks" "molecule" to the external caller:

    let make_philosopher () =
    def hungry () & forks () = eating ()
    and eating () = random_wait () ; thinking ()
    and thinking () = random_wait () ; hungry ()
    spawn thinking ; (* inject a single "molecule" *)
    forks (* return this value *)

    This function will hide the definitions of "hungry", "eating", and "thinking" ; only the "molecule" constructor "forks" is returned as the resulting value and will be visible outside this function. you can now call this function several times, obtaining each time a new set of "molecules" and reactions. Exactly one copy of the "thinking ()" "molecule" will be injected into the soup. In this way, you can define and initialize "reactions" for as many philosophers as you need, at run time

    How will the waiter know that a certain philosopher got hungry or finished eating? Each philosopher's "reactions" need to notify the waiter about these events. Since these events are asynchronous, the way to notify the waiter is by injecting some "molecules" that react with the waiter. Let us call these "molecules" "got_hungry" and "done_eating". Since there is only one waiter, the waiter's "reactions" must be defined regardless of the number of philosophers, the "molecules" "got_hungry" and "done_eating" must be defined statically, i.e. not within the closure "make_philosopher". The "waiter" "reactions" will look something like this:

    def waiter () & got_hungry () = (* decide whether this philosopher can eat *) waiter ()
    or  waiter () & done_eating () = (* decide whether some neighbors of this philosopher can eat *) waiter ()

    The "molecules" "got_hungry" and "done_eating" are statically defined, and yet the waiter needs to identify the philosopher who got hungry or finished eating. So let us label each philosopher by an integer index and put the index onto these molecules

    def waiter () & got_hungry(i) = (* if philosopher i can eat, inject the forks for philosopher i *) waiter ()
    or  waiter () & done_eating(i) = (* inject forks for philosophers i+1 and i-1 if they can eat *) waiter ()
    The "reactions" inside "make_philosopher" must inject these "molecules" ; therefore, "make_philosopher" must receive the "molecule" constructors as arguments. For instance, the "thinking" "reaction" would look like this:

    thinking () = random_wait () ; got_hungry(i) & hungry ()

    So the function "make_philosopher" must also receive the index "i" as an argument. The code becomes

    let make_philosopher i got_hungry done_eating =

    def hungry () & forks () = eating ()

    and eating () = random_wait () ; done_eating(i) & thinking ()

    and thinking () = random_wait () ; got_hungry(i) & hungry ()

    in spawn thinking ; forks


    you can omit the "eating" "molecule" since you do not need to notify the waiter when the philosopher starts eating (the waiter needed to inject the "forks" and knows when that was done). So finally the function is simplified to this:

    let make_philosopher i got_hungry done_eating =

    def hungry () & forks () = random_wait () ; done_eating(i) & thinking ()

    and thinking () = random_wait () ; got_hungry(i) & hungry ()

    in spawn thinking () ; forks


    Suppose that you have "n" philosophers. you will call "make_philosopher" n times and store the n different "forks" constructors in an array

    let all_forks = Array.init n (fun i -> make_philosopher i got_hungry done_eating) ;;

    The "waiter" "reactions" can inject the forks for all philosophers. Let us suppose that you have a function, "may_eat" (to be defined later), that decides whether the philosopher number "i" may start eating. Then the "waiter" "reaction" could inject the "molecule" "all_forks.(i) ()" when appropriate:

    def waiter () & got_hungry(i) = if (may_eat i) then waiter () & all_forks.(i) () else waiter ()

    or waiter () & done_eating(i) = (* inject forks for philosophers i+1 and i-1 if they can eat *) waiter () ;;

    Now you have a minor problem: you cannot define "all_forks" before you define the "molecules" "got_hungry" and "done_eating" in the "waiter" "reaction" ; however, the "waiter" "reaction" already needs to use the "all_forks" array!

    This is like trying to define two functions in terms of each other. In the case of functions, you would use a mutually recursive definition using the "and" keyword. However, here you have a mutual recursion between an OCaml expression and a reaction ; it is not possible to define them simultaneously since "reactions" are not expressions

    To get around this problem, you will not define the "all_forks" array as a global variable, but keep it in the decoration value on the "waiter" molecule. So the code looks like this:

    def waiter(all_forks) & got_hungry(i) = if (may_eat i) then waiter(all_forks) & all_forks.(i) () else waiter(all_forks)

    or waiter(all_forks) & done_eating(i) = (* inject forks for philosophers i+1 and i-1 if they can eat *) waiter(all_forks) ;;

    let all_forks = Array.init n (fun i -> make_philosopher i got_hungry done_eating)

    in spawn waiter(all_forks) ;;

    you are close to finishing the solution ; what remains is to implement the waiter's decision, "may_eat i", to give forks to the philosopher "i". At the absolute minimum, the waiter needs to check that philosophers "i+1" and "i-1" are not eating. So the waiter needs to know the current state of each philosopher. How can you implement this?

    The state of each philosopher is represented by "molecules" "hungry ()", "thinking ()", "eating ()", but these "molecules" are defined in a closure and are invisible to the waiter. Even if these "molecule" constructors were made visible to the waiter, you could not use these "molecules" for reading the states of the philosophers. To do that, you would have to define a "reaction" that starts when, say, the philosophers "i+1" and "i-1" are not eating, but the join calculus cannot define a "reaction" with a computed set of input molecules. Our first solution avoided this limitation, but the price was that all "reactions" were defined statically

    Therefore, you need to represent the state of each philosopher as an explicit OCaml value, in addition to the molecules "hungry ()", "thinking ()", "eating ()". Let us create an array "states" containing a representation of the state of each philosopher. The waiter will read or update the values in that array as necessary:

    type philosopher_state_t = Eating | Hungry of int | Thinking ;;

    let states = Array.make n Thinking ;;

    let may_eat i states = (* check that "i" is hungry and both neighbors are not eating *) ;;

    def waiter(all_forks, states) & got_hungry(i) =

    if (may_eat i) then (

    states.(i) <- Eating ;

    all_forks.(i) () & waiter(all_forks, states)

    ) else (

    states.(i) <- Hungry (get_current_time ()) ;

    waiter(all_forks, states)

    ) or waiter(all_forks, states) & done_eating(i) =

    states.(i) <- Thinking ; (* inject forks for philosophers i+1 and i-1 if they can eat *)

    ... waiter(all_forks, states) ;;

    In the "Hungry" variant value, you store the time at which the philosopher became hungry, assuming that the function "get_current_time" returns an integer timestamp. you will use this timestamp value for deciding which philosopher gets to eat first

    It remains to implement the "may_eat" function and the second "waiter" reaction

    The "may_eat" function does not pose any problems specific to concurrent programming. This function merely checks the states of the philosophers next to the philosopher "i" and decides whether this philosopher may start eating. Here is a possible implementation:

    let next_phil i = (i+1) mod n ;;

    let prev_phil i = (n+i-1) mod n ;;

    let is_hungry p = match p with

    | Hungry _ -> true

    | _ -> false ;;

    let not_eating p = match p with

    | Eating -> false

    | _ -> true ;;

    let is_more_hungry p q = match q with

    | Hungry hj -> (

    match p with

    | Hungry hi -> hi <= hj

    | _ -> false


    | _ -> true


    let may_eat i states = is_hungry states.(i)

    && not_eating states.(next_phil i) && not_eating states.(prev_phil i)

    && is_more_hungry states.(i) states.(next_phil i)

    && is_more_hungry states.(i) states.(prev_phil i) ;;

    The "waiter" "reactions" need a bit more thinking. There are two "reactions" for the "waiter". The first "reaction" is

    waiter(all_forks, states) & got_hungry(i) = if (may_eat i) then (

    states.(i) <- Eating ;

    all_forks.(i) () & waiter(all_forks, states)

    ) else (

    states.(i) <- Hungry (get_current_time ()) ;

    waiter(all_forks, states)


    This "reaction" checks whether the philosopher number "i" is hungry, and starts the eating "reaction" (by injecting the forks) when both neighbors are either thinking or less hungry. If so, the waiter releases the forks for that philosopher

    Here it is important that the "waiter" "molecule" is released into the soup only after updating the "states" array. Otherwise, the "waiter" "reaction" could start processing another philosopher before registering the state of the first philosopher, which will lead to an inconsistency. (Note that you do not actually need to modify the "states" array itself ; instead you could keep the code purely functional, putting an updated copy of the "states" array onto the new "waiter" molecule.)

    The second "waiter" "reaction" needs to process the "molecule" "done_eating(i)". When the philosopher number "i" is done eating, it may happen that the neighbor philosophers were hungry and could start eating now. So the second "reaction" must check whether the philosophers numbered "i+1" and "i-1" may start eating, and if so, inject the forks for those philosophers. In other words, the second "reaction" must repeat for the philosophers "i+1" and "i-1" what the first "reaction" did for the philosopher "i"

    To avoid repetitive code, you would like to define a function that encapsulates this work. Perhaps, you would like to write the "waiter reactions" like this:

    let decide_eating i all_forks states =

    if (may_eat i) then (

    states.(i) <- Eating ;

    all_forks.(i) () & waiter(all_forks, states)

    ) else (

    states.(i) <- Hungry (get_current_time ()) ;

    waiter(all_forks, states)


    in (* will not compile! *)

    waiter(all_forks, states) & got_hungry(i) = decide_eating i


    waiter(all_forks, states) & done_eating(i) = states.(i) <- Thinking ; decide_eating (next_phil i) ; decide_eating (prev_phil i)


    However, this code will not work in JoCaml. The main problem is that you are trying to use the function "decide_eating" as the right-hand side of a reaction. However, an OCaml function must return an OCaml value, while the right-hand side of a "reaction" must be a molecule, which is not an OCaml value. What you can do within a function is to inject "molecules" ("spawn"). For instance, you could inject a "waiter" "molecule" and, if appropriate, a "forks" molecule. However, the second "reaction" needs to update the "states" array before injecting a "waiter" molecule. This will not be done correctly if "decide_eating" injects a "waiter" molecule, because this injection will happen asynchronously and possibly before updating the "states" array by the second call to "decide_eating"

    A correct solution is to compute boolean flags that determine whether the forks were injected and to write the output "molecules" by hand. The array "states" is contained within the "waiter" "molecule" and is modified whenever the waiter receives the information that some philosopher got hungry,:

    let decide_eating i states =

    if (may_eat i states) then (

    states.(i) <- Eating ;

    (states, true)

    ) else (

    (states, false)

    ) ;;

    def waiter(all_forks, states) & got_hungry(i) =

    states.(i) <- Hungry (get_current_time ()) ;

    let (states', will_eat) = decide_eating i states in (

    waiter(all_forks, states') & (if will_eat then all_forks.(i) () else 0)


    or waiter(all_forks, states) & done_eating(i) =

    states.(i) <- Thinking ;

    let (states', next_will_eat) = decide_eating (next_phil i) states in

    let (states'', prev_will_eat) = decide_eating (prev_phil i) states' in (

    waiter(all_forks, states'')

    & (if next_will_eat then all_forks.(next_phil i) () else 0)

    & (if prev_will_eat then all_forks.(prev_phil i) () else 0)

    ) ;;

    In this solution, the forks and the new "waiter" "molecule" are injected only after the "states" array is updated. you use a (mostly) pure functional style, using new variables (states', states'') for the updated array. In this way, it becomes clearer that the "waiter" "molecule" coming out of the "reaction" will carry an updated copy of the "states" array


    While implementing the second solution, the following limitations of JoCaml became apparent:

    1. It is not possible to define a function that evaluates to a molecule, because "molecules" are not OCaml values. This would be useful in order to eliminate repetitive code in outputs of reactions. for example, in the reaction

    def a(x) = if (...)

    then (* compute y', z' *) b(y') & c(z')

    else (* compute x', y', z' *) a(x') & b(y') & c(z')

    we would like to produce the "b(y)" and "c(z)" by a function, in order to avoid repeating the code that computes and emits these molecules. But it is not possible to define a function that directly defines output "molecules" in a reaction. (A function may evaluate to a molecule constructor, which can be used to inject molecules, but a function cannot directly produce a "molecule" expression.) To obviate this limitation, you can inject "b(y)" and "c(z)" inside a function using "spawn". There may be an implementation-dependent difference between "reactions" "def a(x) = spawn b(y) ; c(z)" and "def a(x) = b(y) & c(z)". However, you have seen in the second implementation of the "dining philosophers" that injecting "molecules" from within a "reaction" does not solve the problem, and some repetitive code must remain in the definition of the reaction

    A possible solution is to use "spawn" inside a function body. However, this will not allow us to control the precise time at which the "molecules" are injected into the soup. for example:

    def a () & b () = 0 ;

    let f x = spawn a () & c(x+1) ;;

    def c(x) = f x ; b () ;;

    Consider the "reaction" starting from spawn c(0). The evaluation of "f 0" will inject a () and c(1), and then inject b (). you cannot guarantee that b () will be injected at the same time as a () and c(1). The "reaction" definition "def c(x) = a () & b () & c(x+1)" would have guaranteed that

    2. Mutually recursive definitions of "reactions" and OCaml values are impossible. for example, a "reaction" defines the "molecule" "a" and uses a function "f", but the function "f" needs to use the "molecule" "a"

    def a(x) = a(f x) ;; (* error: "f" is undefined *)

    let f x = spawn a(x) ; x+1 ;; (* uses the "molecule" "a" and so cannot be defined before "def a" *)

    To obviate this limitation, the code needs to be rearranged (e.g. by putting some values onto molecules)

    def a(x,f) = a(f x, f) ;; (* only the type of "f" needs to be fixed at this point *)

    let f x = spawn a(x) ; x+1 ;; (* ok, "a" is already defined *)

    spawn a(0, f) ;; (* ok, "f" is already defined *)

    The (yet undefined) function "f" becomes a value stored in the "molecule" "a", so "f" does not have to be in scope when the "reaction" for "a" is being defined

    3. In the previous section you have seen (in the mergesort example) how to organize "reactions" in a recursive structure, such as a tree, at run time. But "reactions" cannot be organized, at run time, into a non-recursive structure where many instances of similar "reactions" are "coupled", i.e. have common input molecules. (Our first implementation of the "dining philosophers" shows an example of such a coupled "reaction" structure, where five "reactions" shared five input "molecules" and form a "ring".) Such non-recursive "reaction" structures can be defined in JoCaml only statically, since they must be contained in a single def phrase. A closure can create new instances of "molecules" at run time, but the resulting "reactions" will have different input molecules.

    Why is it not possible to define a coupled "reaction" structure at run time? Suppose you first define a "reaction" involving a () and b (),

    def a () & b () = ...

    Suppose you find, at run time, that you need to define another reaction involving "b ()" and "c ()". you would like to write,

    def b () & c () = ..

    But JoCaml cannot add new "reactions" to previously defined input molecules. All the "reactions" that consume "b ()" must be defined at once, statically, in a single def phrase

    To simulate the coupled behavior of "reactions" defined dynamically (at run time), additional dispatching code must be written by hand. The dispatching of dynamically defined "reactions" is the primary role of the "waiter" in our second implementation of the "dining philosophers"

    The solution involves two elements. First, you can generate, at runtime, many identical "reactions" with fresh "molecules" using closures. These "reactions" will be invisible in the outer scope ; only the constructors of some "molecules" will be available (as return values of the closures). These constructors are OCaml values and so can be stored in an OCaml data structure. Second, you maintain OCaml data structures that mirror the desired "reaction" structures ; a special "dispatcher" ("waiter") "reaction" will update the OCaml structure as the "reactions" proceed

    A JoCaml "molecule" library

    Correct chemistry often requires that certain "molecules" are injected only once, or under tightly controlled conditions. A carefully constructed chain of "chemical reactions" may fall apart if the user injects a few stray molecules. you have seen that JoCaml closures can create local molecules, encapsulating and controlling the injection of the molecules. Is it possible to encapsulate common "reaction" patterns into a "molecule library" that would be safe to use?

    Here are some examples of functionality that you would like to have:

    • Run a computation (a "job") in the background ; receive notice and report results when finished
    • Run a job in the background ; at any time, you should be able to check synchronously whether this job is still running. you can also wait synchronously for the completion of the job.
    • Run a job in the background ; either wait until finished, or cancel the job (results will not be reported if the job is cancelled)
    • Run many jobs in the background ; receive notice of progress and accumulate results when all done
    • Define a job as a composition of two jobs: Run the first job, wait until finished, start the second job.
    • Give synchronized access to a shared value (but using pure functions)
    • Mailbox with messages, Erlang-style matching on message values
    • Simulate "reactions" with repeated molecules: a(x) & a(y) = b(x + y), which is not allowed by straight JoCaml
    • For a previously defined set of "molecule" constructors "a", "b", "c", define a new "reaction" such as a(x)&b(y)&c(z) = f x y z ; 0 where "f" is an arbitrary, given function (which might also inject molecules). This is not possible in straight JoCaml ; can you somehow get around this?
    • A general "reaction" structure can be described by a directed graph: each node is a "molecule" constructor ; a "reaction" is a set of input "molecules" from which the arrows point into the (single) output molecule. Define (at run time) the "reactions" according to the graph and start them running.
    • "Map/Reduce" functionality: for a given set of data, generate a set of parallel "map" jobs that may filter this data and produce intermediate results, and a set of "reduce" jobs that gather the intermediate results and process them into the final result. A "concentrate" job can be also useful for gathering groups of intermediate results together.
    • Enable/disable molecules: Given a "molecule" constructor "a(x)", create new "molecules" "a_on ()", "a_off ()", "request_a(x)" so that the requests to inject "a" will be ignored if "a" is in the "off" position but granted if in the "on" position.
    Our goal now is to see how far you can implement these patterns and push JoCaml beyond its apparent limitations

    Static and dynamic molecules

    Molecules and "reactions" in JoCaml are always statically defined and immutable: if you write a reaction, for example

    def a(x) & b(y) = a(x + y) ;;

    we have statically (at compile time) defined the "molecules" "a" and "b" with integer values. It will be impossible to change the types of the values, or to undefine this reaction, or to define an additional "reaction" involving "a" and "b" as input molecules

    Does this mean that JoCaml cannot define new "molecules" and "reactions" at run time at all? No, this is not the case. you can write a function that defines this "reaction" and returns its molecules, for example:

    # let make2 () =

    def a(x) & b(y) = a(x + y)

    in (a,b) ;;

    val make2 : unit -> int Join.chan * int Join.chan = <fun>

    The function make2 () returns a pair of new "molecule" constructors each time it is called. These "molecule" constructors can be used to inject "molecules" into the soup. you can also call make2 many times and, say, store all returned values for future use:

    # let ten_sets_of_molecules = Array.init 10 (fun _ -> make2 () ) ;;

    val ten_sets_of_molecules : (int Join.chan * int Join.chan) array =

    [|(<abstr>, <abstr>) ; (<abstr>, <abstr>) ; (<abstr>, <abstr>) ;

    (<abstr>, <abstr>) ; (<abstr>, <abstr>) ; (<abstr>, <abstr>) ;

    (<abstr>, <abstr>) ; (<abstr>, <abstr>) ; (<abstr>, <abstr>) ;

    (<abstr>, <abstr>)|]

    Each of the 10 sets of "molecules" is separate ; you have created 10 different pairs of molecules. However, each pair is of the same type and has the same kind of reaction, namely "a(x) & b(y) = a(x + y)". The types of "molecules" and the kinds of "reactions" must be defined statically, but it is possible to define, at run time, any number of new "molecules" for the same reactions

    Reaction constructors

    The function "make_2" constructs a reaction. If you call "make_2" several times, you will obtain several independent instances of the same reaction, but with different input molecules. Can you get different reactions?

    you can parameterize "make_2" by specifying a closure for the right-hand side of the reaction:

    let make_ab f = def a(x)&b(y) = f x y ; 0 in (a,b) ;;

    The function "f" can contain arbitrary "spawn" statements, and so it can inject arbitrary other "molecules" as the result of the reaction. for example, if a "molecule" constructor "c" is already defined, you could write

    let (a,b) = make_ab (fun x y -> spawn c(x + y)) ;;

    This will create a new "reaction" equivalent to a(x)&b(y) = c(x + y)

    What if you want to reproduce the "reaction" a(x)&b(y)=a(x + y)? How can you make the function "f" refer to the "molecule" constructor "a" that is not yet defined?

    Let us try a recursive definition:

    # let rec (a,b) = make_ab (fun x y -> spawn a(x + y)) ;;

    Characters 8-13:

    let rec (a,b) = make_ab (fun x y -> spawn a(x + y)) ;;


    Error: Only variables are allowed as left-hand side of `let rec'

    This does not work! you need to avoid pattern-matching in "let rec":

    # let rec x = make_ab( let (a,b)=x in fun x y -> spawn a(x + y)) ;;

    Characters 12-60:

    let rec x = make_ab( let (a,b)=x in fun x y -> spawn a(x + y)) ;;


    Error: This kind of expression is not allowed as right-hand side of `let rec'

    Nope! Maybe the problem is that make_ab returns a tuple rather than a single value?

    # let rec a:int Join.chan = let make_ab f = (def a(x)&b(y) = f x y ;0 in a) in make_ab (fun x y -> spawn a(x + y)) ;;

    Characters 12-95:

    let rec a = let make_ab f = (def a(x)&b(y) = f x y ;0 in a) in make_ab (fun x y -> spawn a(x + y)) ;;


    Error: This kind of expression is not allowed as right-hand side of `let rec'

    This approach fails. you are trying polymorphic recursion here, and JoCaml refuses to understand it (even with an explicit type annotation)

    What exactly are you trying to do here? you would like to call "make_ab f" where "f" is such that "f x y" produces "a(x + y)". you can try to introduce a function "f" that takes "a" as another argument. This works!

    # let make_ab f = def a(x)&b(y) = f a b x y ; 0 in (a,b) ;;

    val make_ab :

    ('a Join.chan -> 'b Join.chan -> 'a -> 'b -> unit) ->

    'a Join.chan * 'b Join.chan = <fun>

    # make_ab (fun a b x y -> spawn a(x + y)) ;;

    - : int Join.chan * int Join.chan = (<abstr>, <abstr>)

    So in this way you have produced the constructors "a" and "b" and a "reaction" of the type "a(x)&b(y)=a(x + y)", and you are free to produce any number of "reactions" with two input molecules

    What is the most general kind of "reaction" with one input molecule? It is "def a(x) = f a x", where "f" is an arbitrary function. Exercise: define a "reaction" constructor for this reaction

    Can you perhaps take one more step towards generalization and put the function "f" on the "molecule" "a" itself? In this way, you just produce a general "molecule" "a(f,x)" that has the "reaction" "def a(f,x) = f a x". you can even define this "molecule" statically, you won't need any functions to produce it! Let us try:

    # def a(f,x) = f a x ; 0 ;;

    Characters 4-10:

    def a(f,x) = f a x ; 0 ;;


    Error: This join-pattern defines a channel of type

    (('a -> 'b Join.chan) * 'b) Join.chan

    but the channel is used with type 'a

    Uh-oh... The error message is sure a cryptic one! The problem here is that you are trying to use a recursive type. The type of "a" is ('a -> 'b Join.chan) * 'b) Join.chan, where 'a and 'b are (as yet) unknown. At the same time, this is the type of the first argument of "f". So you are trying to define a function whose argument's type contains the type of this function itself. This kind of recursion on types works only if you start jocaml with the option "jocaml -rectypes"

    $ jocaml -rectypes

    JoCaml version 3.12.1

    # def react1(f,x) = f react1 x ; 0 ;;

    val react1 : (('a -> 'b Join.chan) * 'b) Join.chan as 'a = <abstr>

    we can interpret "react1" as a universal "reaction" constructor for any "reaction" with a single input molecule. How can you use it? for example, suppose you want to construct the "reaction" "def a(x)=a(x+1)" and inject "a(0)". To construct this reaction, you merely need to inject "react(f,x)" with suitable "f" and "x". The function "f" takes arguments "a" and "x" and injects "a(f, x+1)" where "f" is again the same function. So you find that "f" must be a recursive function with a recursive type! It is easier to define this "f" separately and then to inject "react1(f,0)":

    # let rec f a x = spawn a(f, x+1) ;;

    val f : ('a * int) Join.chan -> int -> unit as 'a = <fun>

    # spawn react1(f, 0) ;; (* now watch CPU usage going to 100% due to "reactions" in the background! *)

    - : unit = ()

    Similarly, you can define a "reaction" constructor for two-molecule reactions, or for any other configuration of molecules:

    def react2a(f,x) & react2b(y) = f react2a react2b x y or react2a(g,x) = g react2a react2b x ;;

    So now you are fully equipped to deal with arbitrary "reaction" constructors.

    As you have seen, "reaction" constructors can be of two kinds:

    • either defined dynamically as functions such as make_ab, returning each time new "molecules" with a fixed "reaction" structure,
    • or defined statically (using recursive types) as "molecule" constructors such as react1, carrying the right-hand sides of "reactions" as decoration values.
    Note that the first kind of "reaction" constructors, "make_ab", are not pure functions: they return each time a fresh "molecule" constructor, not equal to any other "molecule" constructor. Well, "molecule" constructors themselves are not OCaml functions at all, so this "impurity" should not be taken as too grave. The static and declarative guarantees of JoCaml are still with us!

    More general constructors?

    Since the left-hand side of a "reaction" definition is a static pattern-matching construction, you cannot parameterize the names of the "molecules" and their values. you must use different names for each "molecule" and for each decoration value. you cannot write

    def a(x) & b(x+1) & c(x+2) = ... ???

    Neither can you write a function that takes "a" and "x" as the arguments and defines a "reaction" with "def a(x) = ...". This does not make sense in the join calculus. The left-hand side of a def must contain fresh names for all decoration values and for all input molecules

    Observe that this restriction is quite similar to defining a function with a "let" statement: you might define a function like this,

    let f (x, y, z) = ...

    but you are not allowed to write a definition of a function like this:

    let f (x, x+1, x+2) = ... ???

    Nor is it possible to write a function that takes "f" and "x" as its arguments and defines "let f x = ..." with these "f" and "x"! This kind of thing simply makes no sense within the semantics of functional programming. Each "let f x" defines a fresh name "f" and uses a fresh bound name "x"

    But this restriction does not significantly diminish our freedom to define functions! The main result of defining a function is the right-hand side of the function definition, the expression that becomes the body of the function. This is the value that you can parameterize ; for example, you can write a function that defines a new function and returns it. Let us recall how this works (in plain OCaml),

    let make_f g h = let f x = g (h x) in f ;;

    The name "f" defined inside "make_f" will not be visible outside "make_f", but the function itself (i.e. the value of type "function") will be usable. you can call "make_f" ten times and get ten different instances of a function "f". you can parameterize the body of "f" quite easily.

    However, you cannot parameterize the number of arguments in a function (i.e. you cannot write a function that defines another function with n arguments, where n is given at runtime). This is not a significant restriction since you can pack these arguments into a list, making them effectively a single argument. By doing this, you lose the static, compile-time guarantees about the types and the number of arguments of a function (a list can only contain elements of the same type), but at least you can write code that performs as required

    So you expect that static "reaction" definitions should not be a significant restriction on the expressive power of JoCaml, if you are prepared to do some extra organization and to lose some of the strict compile-time guarantees.

    The "reaction" definitions (also called "join definitions") have a left-hand side that carries significant information: the number of the input "molecules" necessary for the "reaction" and the patterns of values on the input molecules. This information cannot be packed into an OCaml value! So you will not be able to parameterize directly the left-hand sides of reactions. In the previous section, you wrote the function "make_2" that creates a new "reaction" with two input molecules. In that function, the number of input "molecules" is statically fixed. you will not be able to write a function "make_n" that creates a "reaction" with "n" input molecules, where "n" is an argument given at run time!

    There are two ways out: first, to create "make_1", "make_2", "make_12", etc., for all necessary types of "reactions" ; second, to use only "reactions" with two input molecules, and to simulate "reactions" with more input "molecules" by chaining. you need to create a different "reaction" constructor for each type of reactions. For instance, if you need simultaneously two "reactions" with the structure a&b = ..., a&c = ..., then you could define a "reaction" constructor like this:

    # let make_ab_ac g h = def a(x)&b(y) = g a b c x y or a(x)&c(y) = h a b c x y in (a,b,c) ;;

    val make_ab_ac :

    ('a Join.chan -> 'b Join.chan -> 'c Join.chan -> 'a -> 'b Join.chan) ->

    ('a Join.chan -> 'b Join.chan -> 'c Join.chan -> 'a -> 'c Join.chan) ->

    'a Join.chan * 'b Join.chan * 'c Join.chan = <fun>

    If you go too far into this direction, you will be essentially duplicating the functionality of the JoCaml "chemistry" system. you will use this technique only if you need a dynamically created "reaction" structure, i.e. a structure where the number of "reactions" is not known at compile time. If you can reduce this structure to a finite number of building blocks, such as the "reactions" created by make_2 or make_ab_ac or whatever, you will be able to organize the "reactions" as you need them at run time

    Running, monitoring, and canceling a job

    (to be continued)