Chu Spaces, Dialectica Spaces, and n-ary Relations

Dialectica spaces were first developed by Valeria de Paiva in her thesis. They happen to be a model of linear logic, in particularly they are a model of full intuitionistic linear logic. This fact came as quite a surprise to Valeria and her Ph.D. advisor Martin Hyland. The project they were working on was to give a categorical interpretation of Gödel’s dialectica interpretation. Naturally, they thought it would give rise to a cartesian closed category, but it turned out to be a model of linear logic!

There is a close cousin to dialectica spaces called Chu spaces. These spaces were developed by Michael Barr and his masters student Po-Hsiang Chu as a model of linear logic. They have also been widely studied by Vaughan Pratt and his group at Stanford. One of the most interesting computational applications of Chu spaces is in concurrency theory.

This post can be thought of as some of my notes on understanding the relationship between dialectica spaces and Chu spaces. The results in this post are mostly due to Valeria de Paiva and Vaughan Pratt; citations will be given below. The main objective of this post is to highlight the differences between the two types of spaces, and to show that Chu spaces can be embedded into dialectica spaces using Valeria’s work. Then we show that all \(n\)-ary relational structures can be embedded into dialectica spaces using Pratt’s work. Finally, we conclude with an application.

Consider the following definitions.

A dialectica space is a function \(r : A \times X \to \Sigma\) such that \(\Sigma\) is a lineale. We call \(\Sigma\) the alphabet and \(X\) the set of states of the dialectica space. Given a dialectica space we can define a new function that returns the columns of the space given a state by \(r_c = \mathsf{cur}(\beta;r) : X \to (A \to \Sigma)\) where beta is the canonical isomorphism \(\beta : X \times A \to A \times X\) and \(\mathsf{cur}\) is the canonical currying isomorphism. When \(r_c\) is injective we call \(r\) extensional.
A Chu space is a function \(r : A \times X \to \Sigma\). We call \(\Sigma\) the alphabet and \(X\) the set of states of the Chu space. Chu spaces can be extensional which is defined exactly how it is defined for dialectica spaces.

The previous definitions imply that we can think of dialectica and Chu spaces as matrices where the set of values in their cells are from \(\Sigma\). Extensional spaces are often easier to work with because the states of the space are unique.

There is only one difference between the two previous definitions, which is the requirement that \(\Sigma\) is a lineale in the definition of dialectica spaces. What is a lineale, and why do we need it? Consider the first question.

A monoidal poset is a poset \((\Sigma,\leq)\) with a given given compatible monoidal structure \((L,\circ,e)\). That is, \(L\) is a set with a reflexive, symmetric, and transitive relation \(\leq\,\subseteq L \times L\) together with a monoidal multiplication \(\circ : L \times L \to L\) and distinguished object \(e \in L\) satisfying the following:

  • Associativity: \(a \circ (b \circ c) = (a \circ b) \circ c\)
  • Symmetry: \(a \circ b = b \circ a\)
  • Identity: \(a \circ e = e \circ a = a\)
  • Compatibility: if \(a \leq b\) then \(a \circ c \leq b \circ c\), for all \(c \in L\)

A lineale is a monoidal poset \((L, \leq, \circ, e)\) such that for any \(a,b \in L\), there exists a largest \(x \in L\) where \(a \circ x \leq b\). This element is denoted by \(a \Rightarrow b\) and is called the relative pseudocomplement of \(a\) wrt \(b\). This definition implies that following hold:

  • Application: \(a \circ (a \Rightarrow b) \leq b\)
  • Currying: \(a \circ c \leq b\) iff \(c \leq a \Rightarrow b\)
  • If \(a \leq b\), then for any \(c \in L\), \(c \Rightarrow a \leq c \Rightarrow b\) and \(b \Rightarrow c \leq a \Rightarrow c\)

Thus, lineales can be seen as a trivialization of a symmetric monoidal closed category.

Lineales are needed to define transformations between dialectica spaces. We define these next along with transformations between Chu spaces. These are what really set the two types of spaces apart.

Suppose \(r : A \times X \to \Sigma\) and \(s : B \times Y \to \Sigma\) are two dialectica spaces over \(\Sigma\). Then a dialectica transformation is a pair of functions \((f,F) : r \to s\) where \(f : A \to B\), \(F : Y \to X\), and the following constraint holds:
\(\forall a \in A.\forall y \in Y.r(a,F(y)) \leq_\Sigma s(f(a),y)\).
We call the previous constraint the semi-adjointness condition, and this condition is the reason why we need the alphabet to be a lineale.
Suppose \(r : A \times X \to \Sigma\) and \(s : B \times Y \to \Sigma\) are two Chu spaces over \(\Sigma\). Then a Chu transformation is a pair of functions \((f,F) : r \to s\) where \(f : A \to B\), \(F : Y \to X\), and the following constraint holds:
\(\forall a \in A.\forall y \in Y.r(a,F(y)) = s(f(a),y)\).
We call the previous constraint the adjointness condition.

Taking dialectica spaces over \(\Sigma\) as objects and dialectica transformations as morphisms we obtain a category called \(\mathsf{Dial}(\Sigma)\), and doing the same for Chu spaces we obtain a category called \(\mathsf{Chu}(\Sigma)\). We have arrived at the following result.

Suppose \(\Sigma\) is a lineale. Then there is an embedding \(E : \mathsf{Chu}(\Sigma) \to \mathsf{Dial}(\Sigma)\).
Take \(E\) to be the identity functor. ◼

The results up to now are not new. They were first investigated by Valeria de Paiva in her TAC paper. She makes a lot more interesting connections between the two types of spaces. The interested reader should see her paper for more fun! We now use her result to show that every \(n\)-ary relation can be represented by a dialectica space.

\(n\)-ary Relations in Dialectica Spaces

Vaughan Pratt showed that any \(n\)-ary relation can be represented by a Chu space over \(\mathsf{2}^n = \{0,1\}^n\) for some \(n \in \mathbb{N}\). This results in Chu spaces being able to represent a large number mathematical structures. We would like this same result for dialectical spaces, and we can obtain it by using the embedding discussed above. However, we must show that \(\mathsf{2}^n\) is a lineale in order to obtain our result. We do this now.

First, we show that we can construct a category of lineales. We have morphisms between lineales.

Suppose \((L_1,\leq_1, \circ_1, e_1,\Rightarrow_1)\) and \((L_2,\leq_2, \circ_2, e_2,\Rightarrow_2)\) are two lineales. Then a lineale map between \(L_1\) and \(L_2\) is a function \(f : L_1 \to L_2\) satisfying the following properties:

  • If \(a \leq_1 b\), then \(f(a) \leq_2 f(b)\)
  • \(f(a \circ_1 b) = f(a) \circ_2 f(b)\)
  • \(f(e_1) = e_2\)
  • \(f(a \Rightarrow_1 b) = f(a) \Rightarrow_2 f(b)\)

It is easy to see that we can now form a category of lineales where composition and identities are set theoretic. We denote this category by \(\mathsf{Lin}\). Next we show that this category has products.

\(\mathsf{Lin}\) has all finite products.
Suppose \((L_1,\leq_1, \circ_1, e_1,\Rightarrow_1)\) and \((L_2,\leq_2, \circ_2, e_2,\Rightarrow_2)\) are two lineales. Then we can construct the monoidal poset \((L_1 \times L_2,\leq_\times, \circ_1 \times \circ_2, (e_1,e_2))\) where \(\leq_\times\) is the lexicographic combination of \(\leq_1\) and \(\leq_2\). Now we must prove that for any \((a,b),(c,d) \in L_1 \times L_2\), there exists a largest \((x_1,x_2) \in L_1 \times L_2\) where \((a,c)\,(\circ_1 \times \circ_2)\,(x_1,x_2) \leq (b,d)\). Choose \(a \Rightarrow_1 b\) for \(x_1\) and \(c \Rightarrow_2 d\) for \(x_2\). Certainly, each of these are largest within their respective set, and hence, their pairing will be largest in the cartesian product. Therefore, \((L_1 \times L_2,\leq_\times, \circ_1 \times \circ_2, (e_1,e_2),\Rightarrow_1 \times \Rightarrow_2)\) is a lineale.

Clearly, there are two lineale maps \(\pi_1 : (L_1 \times L_2,\leq_\times, \circ_1 \times \circ_2, (e_1,e_2),\Rightarrow_1 \times \Rightarrow_2) \to (L_1,\leq_1, \circ_1, e_1,\Rightarrow_1)\) and \(\pi_2 : (L_1 \times L_2,\leq_\times, \circ_1 \times \circ_2, (e_1,e_2),\Rightarrow_1 \times \Rightarrow_2) \to (L_2,\leq_2, \circ_2, e_2,\Rightarrow_2)\) defined by the set theoretic projections. Finally, we must show that the following diagram commutes:

where \((L_3,\leq_3, \circ_3, e_3,\Rightarrow_3)\) is a lineale. Suppose \(f : L_3 \to L_1\) and \(g : L_3 \to L_2\) are two lineale maps. Then choose the set theoretic function \(\langle f,g \rangle : L_3 \to L_1 \times L_2\) for \(h\). Clearly, in sets the previous diagram commutes, but we must verify that \(\langle f,g \rangle\) is a lineale map. We do this verification as follows:

  • Suppose \(a,b \in L_3\) and that \(a \leq_3 b\). Then \(\langle f,g \rangle(a) = (f(a),g(a))\) and \(\langle f,g \rangle(b) = (f(b),g(b))\). By assumption \(f\) and \(g\) are lineale maps, and thus, \(f(a) \leq_1 f(b)\) and \(g(a) \leq_2 g(b)\) which implies that \((f(a),g(a)) \leq_\times (f(b),g(b))\).
  • Suppose \(a,b \in L_3\). Then
    \langle f,g \rangle(a \circ_3 b)
    & = & (f(a \circ_3 b),g(a \circ_3 b))\\
    & = & (f(a) \circ_1 f(b),g(a) \circ_2 g(b))\\
    & = & (f(a),g(a))\,(\circ_1 \times \circ_2)\,(f(b),g(b))\\
    & = & \langle f,g \rangle(a)\,(\circ_1 \times \circ_2)\,\langle f,g \rangle(b).\\
  • First, \(\langle f,g \rangle(e_3) = (f(e_3),g(e_3))\), and because \(f\) and \(g\) are lineale maps \((f(e_3),g(e_3)) = (e_1,e_2)\).
  • Suppose \(a,b \in L_3\). Then
    \langle f,g \rangle(a \Rightarrow_3 b)
    & = & (f(a \Rightarrow_3 b),g(a \Rightarrow_3 b))\\
    & = & (f(a) \Rightarrow_1 f(b),g(a) \Rightarrow_2 g(b))\\
    & = & (f(a),g(a))\,(\Rightarrow_1 \times \Rightarrow_2)\,(f(b),g(b))\\
    & = & \langle f,g \rangle(a)\,(\Rightarrow_1 \times \Rightarrow_2)\,\langle f,g \rangle(b).\\

At this point all that is left to do is show that there is a terminal object. Let \(L_T\) be any one element set where \(\star \in L_T\) is its distinguished object. Then we can construct the lineale \((L_T, \leq_T, \circ_T, \star, \Rightarrow_T)\) as follows:

  • Set \(\leq_T = \{(\star,\star)\}\).
  • Set \(\star \circ_T \star = \star\).
  • Set \(\star \Rightarrow_T \star = \star\).

It is trivial to see that \(L_T\) is a lineale given the previous definitions. Suppose \((L, \leq, \circ, \star, \Rightarrow)\) is a lineale. Then there is a lineale map \(\diamond : L_1 \to L_T\) is simply the terminal map from sets. The verification that it is lineale is straightforward. ◼

If \(L\) is a lineale, then \(L^n = \underbrace{L \times \cdots \times L}_n\) is a lineale for any \(n \in \mathbb{N}\).

To conclude that \(\mathsf{2}^n\) is a lineale it is enough to show that \((\mathsf{2} = \{0,1\},\leq_B\,=\{(0,0),(0,1),(1,1)\},\land,1,\Longrightarrow) \), where \(\land\) and \(\Longrightarrow\) are binary conjunction and implication respectively, is a lineale. It is easy to see via truth tables that conjunction is symmetric, associative, and \(1\) is its identity. Thus, \(\mathsf{2}\) is a monoidal poset. Now consider the following truth table:

a & x & a \land x & b\\
0 & 1 & 0 & 0\\
0 & 1 & 0 & 1\\
1 & 0 & 0 & 0\\
1 & 1 & 1 & 1\\

This truth table proves that for any \(a,b \in \mathsf{2}\), there exists a greatest \(x \in \mathsf{2}\) such that \(a \land x \leq_B b\). Furthermore, notice that the columns for \(a\), \(b\), and \(x\) defines binary implication where \(x = a \Longrightarrow b\). Therefore, \((\mathsf{2},\leq_B,\land,1,\Longrightarrow) \) is a lineale. By the fact that \(\mathsf{Lin}\) is cartesian we may conclude that \(\mathsf{2}^n\) is a lineale.

At this point we simply replay Vaughan Pratt’s result. First, we define \(n\)-ary relations.

An \(n\)-ary relational structure \((X,\rho)\) consists of a set \(X\) and an \(n\)-ary relation \(\rho \subseteq X^n\) on \(X\). A homomorphism \(f : (X,\rho) \to (Y,\sigma)\) between \(n\)-ary relational structures is a function \(f : X \to Y\) such that \(f(\rho) = \{(f(a_1),\cdots,f(a_n)) \mid \overrightarrow{a_i} = (a_1,\cdots,a_n) \in \rho \} \subseteq \sigma\). Taking the relational structures as objects and their homomorphisms as morphisms defines a category called \(\mathsf{Str}_n\).

Next we define an embedding \(E’ : \mathsf{Str}_n \to \mathsf{Dial}(\mathsf{2}^n)\).

We define the object part of \(E’ : \mathsf{Str}_n \to \mathsf{Dial}(\mathsf{2}^n)\) as taking the \(n\)-ary relational structure \((A,\rho)\) to the dialectica space \(r : A \times X \to \mathsf{2}^n\) defined as follows. Take the set of states, \(X\), to be the subset of \(n\)-tuples \(r \in \mathcal{P}(A)^n\) for which \(\Pi_i r_i \subseteq \overline{\rho} = A^n – \rho\). Finally, define \(r : A \times X \to \mathsf{2}^n\) by \(v(a,x) = 1\) if \(a \in r_i\), and \(0\) otherwise.

The object part as defined above is injective.

For all \(\overrightarrow{a_i} \in A^n\), \(\overrightarrow{a_i} \in \rho\) iff for any \(r \in X\), there exists an \(i < n\), such that, \(r(a_i,x)_i = 0\).

It also turns out that \(r : A \times X \to \mathsf{2}^n\) is extensional. See lemma 12 of Pratt’s paper.

We still need to define the morphism part of \(E’\).

We define the morphism part of \(E’ : \mathsf{Str}_n \to \mathsf{Dial}(\mathsf{2}^n)\). Suppose \(f : (A, \rho) \to (B, \sigma)\) be a homomorphism between \(n\)-ary relations, where \(E'(A,\rho) = r : A \times X \to \mathsf{2}^n\) and \(E'(B,\sigma) = s : B \times Y \to \mathsf{2}^n\). Define \(f^{-1} : (\mathsf{2}^n)^B \to (\mathsf{2}^n)^A\) by \(f^{-1}(g : B \to \mathsf{2}^n) = f;g : A \to \mathsf{2}^n\). Finally, define \(E'(f) = (f,f’)\). The correctness of this definition can be found in Pratt’s paper in definition 13.

By theorem 14 of Pratt’s paper \(E’\) is concrete, full, and faithful. Thus, it is an embedding.

Application: Multisets in Dialectica Spaces

This line of inquiry of mine was started because I wanted a way to represent multisets by dialectica spaces over a finite alphabet. A multiset, \(m : A \to \mathbb{N}\), can be represent by an equivalence relation, \(\cong_m \subseteq A \times A\), such that, for any \(a \in A\), \(|[a]_{\cong_m}| = m(a)\). That is, the cardinality of the equivalence class of a particular element in the multiset is the elements multiplicity. Taking this representation of multisets and the previous result we now have a means of interpreting multisets into dialectica spaces over a finite alphabet.

Posted in category theory, Chu spaces, computer science, Concurrency, Dialectica Spaces, Logic, metatheory, Semantics | Tagged | 2 Comments

On the Categorical Model of Fragmented Y

\(\newcommand{\interp}[1]{\unicode{x27e6} #1 \unicode{x27e7}}\)
I recently published a post on the categorical model of fragmented system T. Aaron Stump — my Ph.D. advisor — asked a very good question. One property the Trellys team wants in a PL is a non-terminating fragment to conduct general purpose programming. However, as Aaron pointed out fragmented system T has a terminating programmatic fragment, but what if we make it non-terminating? He asked particularly about adding the Y combinator.
My response was that I did not know how to add just the Y combinator, but I argued that the programmatic fragment could be non-terminating by making it uni-typed.

I took his question as a challenge to figure out how to add just the Y combinator, and then obtain a categorical model. One thing had to be true of our model, it must have a cartesian closed subcategory to model the logical fragment in. Adding the Y combinator to STLC is simple enough, and since STLC is modeled by cartesian closed categories, then there should be an extension of cartesian closed categories with fixpoint operators. Keeping this idea in mind I did some thinking, and looked into the literature. It turns out that we can extend a cartesian closed category with fixpoint operators to obtain a model for the Y combinator. This post describes how this is done.

Fragmented Y

First thing is first. We need to define the functional PL we will be analyzing.

Just as we did in the previous post the programmatic fragment contains all typable terms using the rules annotated with \(\mathsf{Y}\), and the logical fragment contains all typable terms using the rules annotated with \(\mathsf{S}\). We can easily see that the programmatic fragment is non-terminating. Clearly, \(\cdot \vdash_{\mathsf{Y}} \mathsf{Y}\,(\lambda x:T.x) : T\), and

\(\mathsf{Y}\,(\lambda x:T.x) \leadsto (\lambda x:T.x)\,(\mathsf{Y}\,(\lambda x:T.x)) \leadsto \mathsf{Y}\,(\lambda x:T.x)\)

In addition, it is well known that the logical fragment is terminating. Thus, the logical fragment is the isolation of a terminating subtheory of a non-terminating theory.

Categorical Model

The idea is to argue how we can model the theory in a cartesian closed category with some additional structure. Just as we have said above the logical fragment can be modeled by a cartesian closed category. The subtheory rule \({\rm Y\_{\small sub}}\) can be modeled by functor from the model of the logical fragment to the model of the programmatic fragment. This is the same as it was for fragmented system T. In addition, it is easy to handle \(\mathsf{zero}\) and \(\mathsf{suc}\). So the most interesting question here is how do we model the programmatic fragment? The answer is well known, and can be found here — this book has a lot of great things in it, but lacks in presentation and the omission of details in some of the equational reasoning.

We model the programmatic fragment in a cartesian closed category extended with fixpoints for each object.

In any cartesian closed category we have the morphism \(\pi_2 : \Gamma \times (T \Rightarrow T) \to (T \Rightarrow T)\). Using this morphism we can define the following morphism in a cartesian closed category with fixpoint operators:

\( \mathsf{cur}(\pi_2;\mathsf{fix}_T) : \Gamma \to (T \Rightarrow T) \Rightarrow T \)

This morphism is the \(\mathsf{Y}\) combinator. That is, we can define

\( \interp{\Gamma \vdash_{\mathsf{Y}} Y : (T \to T) \to T}\,=\,\mathsf{cur}(\pi_2;\mathsf{fix}_{\interp{T}}) \)

A categorical model must have enough structure to model every typable term, but it must also model the reduction rules. That is, for any \(\Gamma \vdash_{\theta} t_1 : T\) and \(\Gamma \vdash_{\theta} t_2 : T\), if \(t_1 \leadsto t_2\), then \(\interp{\Gamma \vdash_{\theta} t_1 : T} = \interp{\Gamma \vdash_{\theta} t_2 : T}\), where \(\theta \in \{\mathsf{S},\mathsf{Y}\}\). It is well-known that if \(\theta = \mathsf{S}\), then the reduction rules of STLC can be modeled by a cartesian closed category. In fact, we have \(\beta\)-equality in the model:

(\(\beta\)-cat) \( \langle \mathsf{cur}(f),g \rangle;\mathsf{app} = \langle \mathsf{id},g \rangle;(\mathsf{cur}(f) \times id);\mathsf{app} = \langle \mathsf{id},g\rangle;f \)

Using \(\beta\)-cat we can prove the following:

We call the previous fact \(\mathsf{Y}\)-cat. Now we can show that a cartesian closed category with fixpoint operators models the reduction rule for the \(\mathsf{Y}\)-combinator:

It took me the better part of the weekend to get the equational reasoning right, so hopefully I did not make any mistakes.

The way we model the Y combinator is slightly different from the book here. There the authors use the morphism \( !_{\Gamma};\mathsf{cur}(\pi_2;\mathsf{fix}_T) : \Gamma \to (T \Rightarrow T) \Rightarrow T \), where \(!_\Gamma : \Gamma \to \top\) is the terminal morphism, but I do not see a reason to use the terminal morphism when we can just use \(\pi_2\). We obtain the same result here, but I might be missing something.

Finally, we arrive at the following main theorem:

Suppose that \(\mathcal{S}\) is cartesian closed category (a model of the logical fragment of fragmented Y), and that \(\mathcal{Y}\) is a cartesian closed category with fixpoint operators (a model of the programmatic fragment of fragmented Y). Then \( (\mathcal{S}, S, \mathcal{Y}) \) is a model of fragmented Y, where \( S : \mathcal{S} \to \mathcal{Y} \) is a full and faithful functor.

It was a lot of fun figuring this out!

Posted in category theory, metatheory, normalization, Semantics, STLC, Trellys, Type Theory | Tagged , , , , , | Leave a comment

On the Categorical Model of Fragmented System T

Recently, I have been thinking of something I wanted to try for a while, but my categorical skills were not good enough to think about it. In this short simple note I show how a fragmented type theory can be modeled using submodels and a functor between them. The theory I discuss is very basic, but contains the setup I wanted. I hope to build on this in the future.

The Trellys project originally set out in 2008 to design a dependently typed functional programming language with a fast amount of features. The most important of them being the isolation of a logical fragment from a non-terminating fragment called the programmatic fragment. The former amounts to a terminating fragment of the overall language with a new property called freedom of speech. This new property allows for the types of the logical fragment to depend on programs of the programmatic fragment. Thus, the logical fragment can be used to verify properties of the programs of the programmatic fragment. In addition, one can use the logical fragment to ones advantage. If one wishes to implement a program that must be terminating, then this can be internally verified by simply implementing it in the logical fragment, because it is simply a fragment of the set of overall programs.

This fragmentation was accomplished by annotating the sequent with either an \(\mathsf{L}\) or a \(\mathsf{P}\) called a consistency classifier. Then maintaining consistency of the logical fragment was achieved by placing value restrictions at key locations in the definition of type-checking algorithm. Then the addition of a rule similar to:

was added to the programmatic fragment. However, this formalization of the language resulted in a theory that was very difficult to study metatheoretically. New attempts are underway to try and modify the language so as to allow for a metatheoretic study. There are a number of references the interested reader can find with respect to this work. See my thesis and Chris Casinghino’s publications, Vilhelm Sjöberg’s publications, Stephanie Weirch’s publications, and Aaron Stump’s publications.

In this note we examine how one might model a fragmented language such as the Trellys language categorically. We start with a very simple language, and then show that there exists a strightforward functorial semantics.

Fragmented System T

We begin with the definition of the simply typed \(\lambda\)-calculus (STLC).

STLC will be considered our logical fragment. Its terms may be considered proofs while its types formulas. We now enrich this definition with a programmatic fragment resulting in our final system called fragmented system T.

The programmatic fragment contains all typable terms using the rules annotated with \(\mathsf{G}\). This fragment does not make for a very good logic, but is a very good statically typed pure functional programming language. In fact, the programmatic fragment is very powerful even if it seems simplistic. For example, we can type the Ackerman’s function:

\( \lambda m : \mathsf{Nat}.\lambda n : \mathsf{Nat}.((\mathsf{rec}\,m\,(\lambda y : \mathsf{Nat}.(\mathsf{suc}\,y)) \\
\,\,\,\,\,(\lambda h : \mathsf{Nat} \to \mathsf{Nat}.\lambda i : \mathsf{Nat}.\lambda y : \mathsf{Nat}.(\mathsf{rec}\,y\,(h\,(\mathsf{suc}\,0))\,(\lambda k : \mathsf{Nat}.\lambda j:\mathsf{Nat}.(h\,k)))))\,n) \)

Categorical Model

The purpose of this note is to define the categorical model of fragmented system T. The logical fragment (STLC) can easily be modeled by a cartesian closed category. See here for a detailed explanation of this. The programmatic fragment is also well known to be modeled by a cartesian closed category extended with a natural number object and the necessary morphisms to model zero and successor. See here for more information.

So the only interesting question we have is how do we model the subtheory rule:

This requires an operation on homspaces:

Clearly, this is corresponds to the morphism part of a functor from the model of the logical fragment to the programmatic fragment. It just so happens that such a functor exists.

Suppose that \(\mathcal{S}\) is the cartesian closed category modeling the logical fragment, and that \(\mathcal{G} \) is the extended cartesian closed category with the necessary structure modeling the programmatic fragment. Clearly, there exists a functor \(S : \mathcal{S} \to \mathcal{G}\). In addition, this functor is full and faithful, because \(\mathcal{S}\) is isomorphic to the cartesian closed proper subcategory of \(\mathcal{G}\). The latter insures that all of the typeable terms of the logical fragment are indeed modeled within \(\mathcal{S}\). Using this reasoning we arrive at the following result:

Suppose that \(\mathcal{S}\) is a model of the logical fragment of fragmented system T, and that \(\mathcal{G}\) is a model of the programmatic fragment. Then \( (\mathcal{S}, S, \mathcal{G}) \) is a model of fragmented system T, where \( S : \mathcal{S} \to \mathcal{G} \) is a full and faithful functor.

The point of the previous result is that if one fragments a language into parts where the constituent parts can be collapsed back down — e.g. the subtheory rule — then there is a functor between the constituent parts. More on this later.

Posted in category theory, metatheory, Semantics, STLC, Trellys, Type Theory | Tagged , , , , , | 3 Comments

Actor-networks: Social Processes as Computations and Computations as Social Processes

The phone rings at the desk of a secretary for a top-level employee at a medium sized company. The following conversation ensues:

    Secretary: You have reached the desk of Big Wig at medium sized company. I am S how may I help you?

    Caller: Hey S, how you doing today? It’s been a while since we have talked. This is E from the department of information technology.

    Secretary: Oh, hey, yeah it has been a while. Is there anything I can do for you?

    Caller: We are trying to get all of our internal software updated before the holiday. The bosses down here are getting agitated that we haven’t got some of the workstations updated yet. So I am on the hook to call up everyone at an out of date workstation, and have them install the update. Think you can help me out with that right now?

    Secretary: Sure, that’s no problem! What do I need to do?

    Caller: Great! The process is simple. I just sent you an email. All you have to do is click the link, and save the download to your desktop.

    Secretary: Just got your email. Okay, I think it is downloaded.

    Caller: Now simply click the file you downloaded and hit next a few times. Then it should say that it successfully installed.

    Secretary: Just finished.

    Caller: Excellent! Go a head and open the internal software and make sure everything is still running okay.

    Secretary: Everything seems to be working.

    Caller: Perfect. So that we don’t get confused when we update again later go a head and delete that file. Then empty your trash. Having files in the trash can make your machine run slower.

    Secretary: I didn’t know that it could make my machine run slower. I deleted it and emptied the trash.

    Caller: That’s all I needed. Say hello to Big Wig for me, and if you have any issues please do not hesitate to call our department. Thanks for helping.

    Secretary: No problem at all. Have a great holiday.

In the above scenario the caller, E, was not actually an employee of the information technology department. They were a malicious hacker trying to gain access to the companies network. They did this by posing as an employee to gain the confidence of the secretary. Then they had them install an “update to the internal software”, but actually this “update” was a virus designed to give E access to the companies network. The virus could have done much more like steal passwords, credit card numbers, and even move from machine to machine by hiding in file attachments. This style of virus is not meant to destroy the machine, but act as a mole and transmit information from the machine to E. It could hide and be unnoticed for a substantial amount of time.

The type of attack E used is called social engineering. It entails manipulating another person into gaining access to confidential information or resources. Social engineering to gain access to computer networks is often considered to be the easiest method to circumvent the security of a workstation on a computer network. Then once access is obtained it is easier to gain access to the other network resources. The notorious computer hacker turned computer security consultate Kevin Mitnick used this technique to gain access to the networks of various prominent technology companies.

The study of computer security uses mathematical models of various styles of networks, and then studies various attacks on these models using various mathematical tools. For example, a network of two machines can be modeled by two Turing machines, and then an additional tape that both Turing machines have access to. Then shared communication takes place by reading and writing on the shared tape.

In this blog post I am going to summarize some of the work of Dusko Pavlovic. He has started working on solving two major problems in computer security. I hold the same beliefs that he does with respect to the first problem. See this email for a list of papers in this areas.

The first problem with the study of computer security is that it largely ignores the social aspect of computer security. There is a human element that must be considered. However, one may object stating that capturing the human involvement will be to difficult, because there are just to many variables to keep track of. I hope that this blog post will connivence the reader that this objection is just not true.

The second problem with computer security is that practical models of various security scenarios are very complex. They usually entail lots of crazy diagrams, and even more complex reasoning. This implies that there are most likely errors in reasoning that go unnoticed! Thus, the community needs more formal methods that strip away this complexity, and where this complexity cannot be removed perhaps automated or semi-automated — e.g. interactive theorem provers — reasoning can be used to provide a higher confidence in the results obtained using the models.

I believe — and I am sure Dusko does as well judging by his papers — that category theory, type theory, verification of programs, and automated theorem proving can contribute to solving the second problem. Category theory is very good at stripping away unneeded complexity, and the various graphical languages that are available in category theory can be used to strip away the complexity of security models, and provide easier methods of reasoning. In addition, type theory, verification of programs, and automated theorem proving can be used to automate some of the receptive complex reasoning.

In the remainder of this blog post I would like to describe an effort to study the first problem. That is, how do we add the human element formally? Dusko Pavlovic and Catherine Meadows have a paper called “Actor-network procedures: Modeling multi-factor authentication, device pairing, and social interaction.” They introduce the concept of an actor network that gives a way to formally connect humans and devices. The rest of this blog post is a summery of what an actor network is.

There are lots of objects that must be paired with a human in order for their full potential to be reached. One analogy the authors give is that of a musical instrument. Suppose an acoustic guitar is leaning against a wall. The guitar sits ideally waiting to be played, because a musical instrument cannot make music on its own. It must be played by a human. The instrument is the actor. Humans control actors. The coming together of these several actors controlled by various humans is called an actor network.

Computers provide another example. There is very little we as a society do with our computers that do not require a human element. Modern day computer security uses this to its advantage when authorizing use of resources. Consider my old bank the University of Iowa Community Credit Union in Iowa City, IA. In order to access the online banking website one had to first enter a username and password, but then if I haven’t logged in since I cleared my browser data, then the website would send a code to my cellphone — actually it was my wives phone which made it even hard for me to login from work — then after receiving this code I was able to log in. This is called two factor authentication. The actor network here consists of the human (me), my computer, my cell phone, and the banks webserver.

So what is the formal definition of an actor network? Start with the following precategory:

where \( \mathcal{C} = \{f,g,\ldots\} \) is a collection channels, \( \mathcal{P} = \{P,Q,\ldots\}\) is a collection of configurations, \(\delta : \mathcal{C} \to \mathcal{P} \) maps each channel to an entry configuration, and \(\varrho : \mathcal{C} \to \mathcal{P} \) maps each channel to an exit configuration. So we can denote channels by \(f : \delta(f) \to \varrho(f) \). Thus, channels are premorphisms and configurations are objects.

We pair the previous structure with a collection of nodes denoted by \(\mathcal{N} = \{M,N,\ldots\} \). Then we define configurations to be trees. Each \(P \in \mathcal{P}\) can either be a finite set of nodes or a finite set of configurations. We define an actor to be an element of a configuration. Nodes represent canonical configurations. Another example might be that a paperback book is a configuration consisting of paper, glue, and ink.

Channels connect configurations together. For example, there might be a channel between a phone and a webserver. A secure channel might be a channel between two devices that is encrypted.

Our definition so far can account for devices (configurations) and the channels connecting them, but we what about all this human business? We now extend the previous definition to the following structure:

where \(\mathcal{I} = \{A,B,\ldots\} \) is a collection of identities or principles, and \( © : \mathcal{P} \to \mathcal{I} \) is a partial function assigning identities to configurations. Now we can talk about a person controlling a configuration or node.

One last addition. As of right now channels are left very abstract, and we have no way of specifying what type of channels we are dealing with. So we extend the previous structure as follows:

where \( \Theta = \{\tau_1, \tau_2\, \ldots \} \) is a collection of channel types, and \(\vartheta : \mathcal{P} \to \Theta\) is a map assigning each channel a type. This final structure defines an actor network.

Lets consider an example. It has recently been announced that the United States Army Cyber Command headquarters is moving to Augusta, GA. In fact, to Fort Gordon where I live only two miles away. So in honor of this new move I share the example from Dusko and Catherine’s paper. We define the actor network modeling a cyber network as follows:

  • \(\mathcal{I} = \{A_1,\ldots,A_n\}\), i.e. an arbitrary set of identifies,
  • \(\mathcal{N} = \{N_1,\ldots,N_n\}\), i.e. an arbitrary set of nodes,
  • \(\mathcal{P} = \mathcal{N}\), i.e. the only configurations are nodes,
  • \(\mathcal{C} = \{M \to N \mid M,N \in \mathcal{N}\}\), i.e. there is exactly one channel between any two nodes,
  • \(\Theta = \{\mathsf{cyber}\}\), i.e. there is only one type of channel called \(\mathsf{cyber}\) representing an insecure cyber connection,
  • All communication is done by broadcast from the sender to all nodes in the network. The receipt of the message does not observe the sender directly. Finally, we assume a person only controls one node.

For more examples see the paper. This is only the first part of the paper. I will be posting a second post talking about more things I learn from it.

Posted in category theory, Computer Security | Tagged , , , | Leave a comment

Categorical Proof Theory of Dualized Type Theory: Filtered Multicategories

\(\newcommand{\ottnt}[1]{\mathit{#1}}\newcommand{\ottmv}[1]{\mathit{#1}}\newcommand{\ottkw}[1]{\mathbf{#1}}\newcommand{\ottsym}[1]{#1}\newcommand{\ottcom}[1]{\text{#1}}\newcommand{\ottcomplu}[5]{\overline{#1}^{\,#2\in #3 #4 #5}}\newcommand{\ottcompu}[3]{\overline{#1}^{\,#2<#3}}\newcommand{\ottcomp}[2]{\overline{#1}^{\,#2}}\)

In the last post I talked about the need for a categorical proof theory of BINT. In this post I would like to start laying out an idea for how we can obtain one for Dualized Type Theory (DTT). At first I had planned on laying this post out as a private note, but I decided to make it public to try and get feedback. I am not sure if my idea already exists or not, and if the reader knows please do tell me! Now in this post I only detail the initial categorical framework I will interpret DTT in, and so, I will be presenting filtered categories as multicategories, but the final model we will eventually arrive at in a future post will be a polycategorical version of what is described here. However, the latter has its own hurdles to get through, and I am still trying to understand those hurdles.

A filtered category (FC) can be thought of as starting with an arbitrary multicategory, say \(\mathcal{C}\), and layering over the top of it another category, say \( \mathcal{F} \) (the filter). The following image illustrates this idea:

The job of this layer is to act as a filter, and only let through specific structures. The above image captures this idea by having the filter, \(\mathcal{F}\) over \(\mathcal{C}\), and the filter is shown by making the filter layer a little transparent so that we can only make out some of the bottom layer. The way in which the filter will restrict \(\mathcal{C}\) will become more apparent when we give the definition of a FC. In fact, lets do that now.

Filtered Categories

A filtered category (FC) is a multicategory defined in terms of two structures given by a pair denoted \(\ottsym{(} \mathcal{C} \downarrow \mathsf{Cat}( \mathcal{W} ) \ottsym{)} \) where \(\mathcal{C}\) is a multicategory called the lax component, \(\mathcal{W}\) is a collection of worlds, and \( \mathsf{Cat}( \mathcal{W} ) \) is \( \mathcal{W} \times \mathcal{W} \) considered as a category which we call the constraint universe of the FC. A FC, \(( \mathcal{C} \downarrow \mathsf{Cat}( \mathcal{W} ) )\), is defined as follows:

  • (Objects) A collection of objects each denoted \(\ottmv{A} \ottsym{@} \ottmv{w}\), where \( \ottmv{A} \in \mathsf{Obj}( \mathcal{C} ) \) and \( \ottmv{w} \in \mathcal{W} \).
  • (Morphisms) A collection of multimorphisms. Each morphism is a triple \((f , M, \{(\ottmv{A_{{\mathrm{1}}}},\ottmv{w_{{\mathrm{1}}}}),\ldots,(\ottmv{A_{\ottmv{m}}},\ottmv{w_{\ottmv{m}}}),(B,w)\})\) where \(f \in \mathcal{C} \ottsym{(} \ottmv{A_{{\mathrm{1}}}} \ottsym{,} \, … \, \ottsym{,} \ottmv{A_{\ottmv{m}}} \ottsym{,} \ottmv{B} \ottsym{)} \), \(M\) — the morphism constraints — is a subcategory of \( \mathsf{Cat}( \mathcal{W} ) \), such that, \(\{\ottmv{w_{{\mathrm{1}}}},\ldots,\ottmv{w_{\ottmv{m}}},\ottmv{w}\} \subseteq \mathsf{Obj}( \ottmv{M} ) \). We denote each morphism by \(f : \ottmv{M}; \ottmv{A_{{\mathrm{1}}}} \ottsym{@} \ottmv{w_{{\mathrm{1}}}},\ldots,\ottmv{A_{\ottmv{m}}} \ottsym{@} \ottmv{w_{\ottmv{m}}} \to \ottmv{B} \ottsym{@} \ottmv{w}\).
  • (Identities) For any list of objects, \(\ottmv{A_{{\mathrm{1}}}} \ottsym{@} \ottmv{w_{{\mathrm{1}}}}, \ldots, \ottmv{A} \ottsym{@} \ottmv{w}, \ldots, \ottmv{A_{\ottmv{i}}} \ottsym{@} \ottmv{w_{\ottmv{i}}}\) there is a morphism \(\mathsf{id} : M;\ottmv{A_{{\mathrm{1}}}} \ottsym{@} \ottmv{w_{{\mathrm{1}}}}, \ldots, \ottmv{A} \ottsym{@} \ottmv{w}, \ldots, \ottmv{A_{\ottmv{i}}} \ottsym{@} \ottmv{w_{\ottmv{i}}} \to \ottmv{A} \ottsym{@} \ottmv{w}\).
  • (Vertical Composition) For any two morphisms \(f \in \ottsym{(} \mathcal{C} \downarrow \mathsf{Cat}( \mathcal{W} ) \ottsym{)} ( \ottmv{M} ; \ottmv{A_{{\mathrm{1}}}} @ \ottmv{w_{{\mathrm{1}}}} , \ldots, \ottmv{A_{\ottmv{i}}} @ \ottmv{w_{\ottmv{i}}} , \ottmv{B} @ \ottmv{w’} ) \) and
    \(g \in \ottsym{(} \mathcal{C} \downarrow \mathsf{Cat}( \mathcal{W} ) \ottsym{)}(M';C_{{\mathrm{1}}} \ottsym{@} \ottmv{w’_{{\mathrm{1}}}}, \ldots, \ottmv{B} \ottsym{@} \ottmv{w’}, \ldots,C_{\ottmv{j}} \ottsym{@} \ottmv{w’_{\ottmv{j}}}, \ottmv{D} \ottsym{@} \ottmv{w”})\) their composition is also a morphism, that is, \(f;g \in \ottsym{(} \mathcal{C} \downarrow \mathsf{Cat}( \mathcal{W} ) \ottsym{)}(M,M';C_{{\mathrm{1}}} \ottsym{@} \ottmv{w’_{{\mathrm{1}}}},\ldots,\ottmv{A_{{\mathrm{1}}}} \ottsym{@} \ottmv{w_{{\mathrm{1}}}},\ldots,\ottmv{A_{\ottmv{i}}} \ottsym{@} \ottmv{w_{\ottmv{i}}},\ldots,C_{\ottmv{j}} \ottsym{@} \ottmv{w’_{\ottmv{j}}},\ottmv{D} \ottsym{@} \ottmv{w”})\).
  • (Horizontal Source Composition) For any morphisms \(f \in \ottsym{(} \mathcal{C} \downarrow \mathsf{Cat}( \mathcal{W} ) \ottsym{)}(M;A_1@w_1, \ldots, A@w, \ldots,A_n@w_n,B@w”)\) and \(\alpha \in M(w,w’) \) the horizontal composition \(f \rightharpoonup \alpha :M;A_1@w_1, \ldots, A@w’, \ldots,A_n@w_n,B@w” \) is also a morphism.
  • (Horizontal Target Composition) For any morphisms \(f \in \ottsym{(} \mathcal{C} \downarrow \mathsf{Cat}( \mathcal{W} ) \ottsym{)}(M;A_1@w_1, \ldots, A_n@w_n,B@w)\) and \(\alpha \in M(w,w’) \) the horizontal target composition \(f \rightharpoondown \alpha :M;A_1@w_1, \ldots, A_n@w_n,B@w’ \) is also a morphism.
  • Finally, we have the following axioms:
    • (Vertical Identity) For any morphism \( f \in \ottsym{(} \mathcal{C} \downarrow \mathsf{Cat}( \mathcal{W} ) \ottsym{)}(M;A_1@w_1, \ldots, A@w, \ldots, A_i@w_i,B@w’)\) we have \(f ; \mathsf{id}_{B@w’} = f = \mathsf{id}_{A@w};f \).
    • (Associativity of Vertical Composition) For any morphisms\( f \in ( \mathcal{C} \downarrow \mathsf{Cat}( \mathcal{W} ) )(M_1;A_1@w_1, \ldots, A_i@w_i,B@w’)\),\( g \in \ottsym{(} \mathcal{C} \downarrow \mathsf{Cat}( \mathcal{W} ) \ottsym{)}(M_2;B_1@w_1, \ldots, B@w’, \ldots, B_i@w_i,C@w”)\), and \( h \in \ottsym{(} \mathcal{C} \downarrow \mathsf{Cat}( \mathcal{W} ) \ottsym{)}(M_3;C_1@w_1, \ldots, C@w”, \ldots, C_i@w_i,D@w”’)\) we have \(f;(g;h) = (f;g);h\).
    • (Horizontal Source Identity) For any morphism \( f \in \ottsym{(} \mathcal{C} \downarrow \mathsf{Cat}( \mathcal{W} ) \ottsym{)}(M;A_1@w_1, \ldots, A@w, \ldots, A_i@w_i,B@w’)\) we have \(f \rightharpoonup \mathsf{id}_w = f\).
    • (Horizontal Source Unfold) For any morphisms \(f \in \ottsym{(} \mathcal{C} \downarrow \mathsf{Cat}( \mathcal{W} ) \ottsym{)}(M;A_1@w_1, \ldots, A@w, \ldots,A_n@w_n,B@w”’)\), \(\alpha_1 \in M(w,w’) \), and \(\alpha_2 \in M(w’,w”) \) we have \(f \rightharpoonup (\alpha_1;\alpha_2)\) = \((f \rightharpoonup \alpha_1) \rightharpoonup \alpha_2\).
    • (Horizontal Source Symmetry) For any morphisms \(f \in \ottsym{(} \mathcal{C} \downarrow \mathsf{Cat}( \mathcal{W} ) \ottsym{)}(M;A_1@w_1, \ldots, A@w, \ldots,A_n@w_n,B@w”’)\), \(\alpha_1 \in M(w,w’) \), and \(\alpha_2 \in M(w’,w) \) we have \(f = f \rightharpoonup (\alpha_1;\alpha_2)\).
    • (Horizontal Target Identity) For any morphism \( f \in \ottsym{(} \mathcal{C} \downarrow \mathsf{Cat}( \mathcal{W} ) \ottsym{)}(M;A_1@w_1, \ldots, A_i@w_i,B@w)\) we have \(f \rightharpoondown \mathsf{id}_w = f\).
    • (Horizontal Target Unfold) For any morphisms \(f \in \ottsym{(} \mathcal{C} \downarrow \mathsf{Cat}( \mathcal{W} ) \ottsym{)}(M;A_1@w_1, \ldots,A_n@w_n,B@w)\), \(\alpha_1 \in M(w,w’) \), and \(\alpha_2 \in M(w’,w”) \) we have \(f \rightharpoondown (\alpha_1;\alpha_2)\) = \((f \rightharpoondown \alpha_1) \rightharpoondown \alpha_2\).
    • (Horizontal Target Symmetry) For any morphisms \(f \in \ottsym{(} \mathcal{C} \downarrow \mathsf{Cat}( \mathcal{W} ) \ottsym{)}(M;A_1@w_1, \ldots,A_n@w_n,B@w)\), \(\alpha_1 \in M(w,w’) \), and \(\alpha_2 \in M(w’,w) \) we have \(f = f \rightharpoondown (\alpha_1;\alpha_2)\).
    • (Horizontal-Vertical Symmetry) For any morphisms \(f \in \ottsym{(} \mathcal{C} \downarrow \mathsf{Cat}( \mathcal{W} ) \ottsym{)}(M;A_1@w_1, \ldots,A_n@w_n,B@w)\), \(\alpha_1 \in M(w_i,w’_i) \), and \(\alpha_2 \in M(w,w’) \) we have \((f \rightharpoonup \alpha_1) \rightharpoondown \alpha_2)\) = \((f \rightharpoondown \alpha_2)\rightharpoonup \alpha_1)\).

The categories on morphisms should be thought of as being constraints on particular morphisms, and thus, the only part of the category that matters is the set of morphisms. So we will denote these categories as simply lists of morphisms, that is, \( \ottmv{w_{{\mathrm{1}}}} \leq \ottmv{w_{{\mathrm{2}}}} , \ldots, \ottmv{w_{\ottmv{i}}} \leq \ottmv{w}_{i+1}\) but with the additional property that this list should be a category, and hence, identity and composition morphisms must be present. That being said, these constraint categories could get large due to having to have identities for every object, and the composition morphisms, so we will often only list out the non-trivial morphisms and leave the existence of the identity and composition morphisms implicit. The notation \(\ottmv{M_{{\mathrm{1}}}},\ottmv{M_{{\mathrm{2}}}}\) for categories \(\ottmv{M_{{\mathrm{1}}}}\) and \(\ottmv{M_{{\mathrm{2}}}}\) is a combination operation. This notation stands for the category defined as the union of the two categories.

The category \(\ottmv{M_{{\mathrm{1}}}},\ottmv{M_{{\mathrm{2}}}}\) is defined as follows:

  • \( \mathsf{Obj}( \ottmv{M_{{\mathrm{1}}}} \ottsym{,} \ottmv{M_{{\mathrm{2}}}} ) = \mathsf{Obj}( \ottmv{M_{{\mathrm{1}}}} ) \cup \mathsf{Obj}( \ottmv{M_{{\mathrm{2}}}} ) \)
  • For any objects \(A,B \in \mathsf{Obj}( \ottmv{M_{{\mathrm{1}}}} \ottsym{,} \ottmv{M_{{\mathrm{2}}}} ) \), either \((\ottmv{M_{{\mathrm{1}}}} \ottsym{,} \ottmv{M_{{\mathrm{2}}}})(A,B) = \ottmv{M_{{\mathrm{1}}}}(A,B)\) if \(A,B \in \ottmv{M_{{\mathrm{1}}}}\), or \((\ottmv{M_{{\mathrm{1}}}} \ottsym{,} \ottmv{M_{{\mathrm{2}}}})(A,B) = \ottmv{M_{{\mathrm{2}}}}(A,B)\) if \(A,B \in \ottmv{M_{{\mathrm{2}}}}\).

Keeping with the theme of representing the constraint categories as lists of non-trivial morphisms we define the union — the operation defined above — of \(\ottmv{M_{{\mathrm{1}}}} = \ottmv{w_{{\mathrm{1}}}} \leq \ottmv{w_{{\mathrm{2}}}} , \ldots, \ottmv{w_{\ottmv{i}}} \leq \ottmv{w}_{i+1}\) and \(\ottmv{M_{{\mathrm{2}}}} = \ottmv{w’_{{\mathrm{1}}}} \leq \ottmv{w’_{{\mathrm{2}}}} , \ldots, \ottmv{w’_{\ottmv{j}}} \leq \ottmv{w’}_{j+1}\) as \(\ottmv{M_{{\mathrm{1}}}} \ottsym{,} \ottmv{M_{{\mathrm{2}}}} = \ottmv{w_{{\mathrm{1}}}} \leq \ottmv{w_{{\mathrm{2}}}} , \ldots, \ottmv{w_{\ottmv{i}}} \leq \ottmv{w}_{i+1}, \ottmv{w’_{{\mathrm{1}}}} \leq \ottmv{w’_{{\mathrm{2}}}} , \ldots, \ottmv{w’_{\ottmv{j}}} \leq \ottmv{w’}_{j+1} \).

Some Intuition

The best intuition I can give for a FC, \( \mathcal{C} \downarrow \mathsf{Cat}( \mathcal{W} ) \), is to think about the elements of \(\mathcal{W}\) as worlds where objects of \(\mathcal{C}\) live. Then one should think of the morphisms of \( \mathsf{Cat}( \mathcal{W} ) \) as transitions between the worlds. Thus, the reader familiar with Kripke models should be able to see that we are considering \( \mathsf{Cat}( \mathcal{W} ) \) as imposing a Kripke like structure over \(\mathcal{C}\).

Now horizontal composition corresponds to monotonicity if we think of a morphism as a proof of its target assuming its list of source objects. Notice, that horizontal composition only moves forward, that is because, when formulas are true in a Kripke model we can only move them forward along the transition relation. So since there is no negation here we can only move forward.


Every category, \(\mathcal{C}\), is isomorphic to a FC, \((\mathcal{C}\downarrow\mathsf{Cat}(\mathcal{W}))\) where \(\mathsf{Cat}(\mathcal{W})\) is a discrete category with only one object \(w\). The constraint universe has only one morphism, the identity on \(w\), and hence, the only constraints on morphisms in the FC is this single identity morphism. So we can define bijective functors as follows:

    • \(\mathsf{Lax}(A@w) = A\)
    • \(\mathsf{Lax}(f : \mathsf{Cat}(\mathcal{W});A_1@w,\ldots,A_i@w \to B@w) = f : A_1,\ldots,A_i \to B\)
    • \(\mathsf{Filtered}(A) = A@w\)
    • \(\mathsf{Filtered}(f : A_1,\ldots,A_i \to B) = f : \mathsf{Cat}(\mathcal{W});A_1@w,\ldots,A_i@w \to B@w\)


One can define functors between filtered categories:

A functor, \(H\), between two FCs \((\mathcal{C_1}\downarrow\mathsf{Cat}(\mathcal{W}_1))\) and \((\mathcal{C_2}\downarrow\mathsf{Cat}(\mathcal{W}_2))\) is a pair of functors \((F : \mathcal{C}_1 \to \mathcal{C}_2, G : \mathsf{Cat}(\mathcal{W}_1) \to \mathsf{Cat}(\mathcal{W}_2) )\) defined as follows:

  • (Objects) \(H(A@w) = F(A)@G(w)\)
  • (Morphisms) \(H(f : M;A_1@w_1,\ldots,A_i@w_i \to B@w) =\)
    \(F(f) : G(M);F(A_1)@G(w_1),\ldots,F(A_i)@G(w_i) \to F(B)@G(w)\)

Since \(F\) and \(G\) are functors \(H\) will respect both vertical and horizontal compositions as well as identities.

Structures in FCs

FCs can have two different types of structures: structures inherited from the lax component, and structures global to the FC. The latter are the usual structures built from morphisms such as products, co-products, and exponentials. However, the former are structures that are constrained versions of the structures in the lax component. Defining these is were the constraints are really going to come into play. Now there are lots of ways one could constrain the structures of the lax component using the constraint universe. The following are examples of the types of structures I will need when interpreting DIL/DTT. Throughout this section I will denote lists of objects of an FC by \(\Gamma\) and \(\Delta\).

A FC, \((\mathcal{C}\downarrow\mathcal{W})\), has a pointed initial object, denoted \(\perp @ w \), if for any category \(M\), worlds \(w,w’ \in \mathsf{Obj}(M)\), sequences of objects \(\Gamma\) and \(\Delta\), and object \(A@w’ \in \mathsf{Obj}(\mathcal{C}\downarrow\mathcal{W})\), there exists a unique morphism \(\triangle : M; \Gamma,\perp@w,\Delta \to A@w’\).
A FC, \((\mathcal{C}\downarrow\mathcal{W})\), has a pointed final object, denoted \(\top @ w \), if for any category \(M\), world \(w \in \mathsf{Obj}(M)\), and sequence of objects \(\Gamma\), there exists a unique morphism \(\nabla : M; \Gamma \to \top @ w\).
A FC, \((\mathcal{C}\downarrow\mathcal{W})\), has pointed products, if for any category \(M\), world \(w \in \mathsf{Obj}(M)\), two objects \(A@w\) and \(B@w\) of \((\mathcal{C}\downarrow\mathcal{W})\), and morphisms \(f : M_1; \overrightarrow{C@w’} \to A@w\) and \(g : M_2 ; \overrightarrow{C@w’} \to B@w\), there is an object denoted \((A \times B)@w\), two morphisms \(\pi_1 : M ; (A \times B)@w \to A@w\) and \(\pi_2 : M ; (A \times B)@w \to B@w\), and there exists a unique morphism
\(\langle f,g\rangle : M_1,M_2; \overrightarrow{C@w’} \to (A \times B)@w\) in \((\mathcal{C}\downarrow\mathcal{W})\) where the following diagram commutes:
We call a FC pointed cartesian if all finite pointed products exist, and there exists a pointed final object.
A FC, \((\mathcal{C}\downarrow\mathcal{W})\), has pointed co-products, if for any category \(M\), world \(w \in \mathsf{Obj}(M)\), two objects \(A@w\) and \(B@w\) of \((\mathcal{C}\downarrow\mathcal{W})\), and morphisms \(f : M_1; A@w \to C@w’\) and \(g : M_2 ; B@w \to C@w’\), there is an object denoted \((A + B)@w\), two morphisms \(\iota_1 : M ; A@w \to (A + B)@w\) and \(\iota_2 : M ; B@w \to (A + B)@w\), and there exists a unique morphism
\([f,g] : M_1,M_2; (A + B)@w \to C@w’\) in \((\mathcal{C}\downarrow\mathcal{W})\) where the following diagram commutes:
We call a FC pointed bi-cartesian if all finite pointed co-products exist, and there exists a pointed inital object.

I think I will leave this post here. This is just a look at what I am considering. In the close future I will be thinking about internal homs in FC, and generalizing to polycategories, then after these are figured out, we will be able to interpret DIL and DTT in a categorical framework.

Posted in category theory, computer science, Logic, metatheory, Semantics, Type Theory | Leave a comment

Linear bi-Intuitionistic logic cannot be full

This post was written while I was attending VSL from July 12 – July 17 2014.

This week I am in Vienna — the capital of Austria — attending the Vienna Summer of Logic (VSL). It has been a blast! The feeling I get when attending a conference is pure unadulterated motivation, creativity, and excitement. I just love this atmosphere. So many great researchers giving great talks about interesting work.

So what do I do with this excitement you might ask? I start to reflect on my own research quite deeply, and ponder if what I am doing is the best I can do, and if the answer is no, then I begin fixing it so that it is. Now, by “the best I can do” I do not mean “will get into the best conference/journal” or “will make me famous”, but rather only that it meets my personal standards, basically I want to be able to be proud that my work is my work and anything less is a waste. An example of this was when I attended a three week summer school in Eugene Organ called the Organ Programming Languages Summer School. There was the first time I got this feeling, and I ended up reworking how proofs of normalization using hereditary substitution worked from the ground up.

So what am I thinking about now that I have this feeling again? My research focused on understanding bi-intuitionistic simple type theory with a non-trivial categorical model. The work I have done so far — in collaboration with Aaron Stump and Ryan McCleeary — can be found here and was presented at The 5th International Workshop on Classical Logic and Computation (CL&C) 2014 on Sunday July 13.

Currently, Dualized Type Theory (DTT) has a sequent that looks like \(G;x_1 : p_1 A_1@n_1,\ldots,x_j : p_j A_j@n_j \vdash t : p B @ n \). There each \(x_i\) and \(t\) are terms while \(G\) is a graph — defined as a list of ordered pairs — and each \(n_i\) and \(n\) are nodes in \(G\), finally each \(p_i\) and \(p\) are polarities — either \(+\) or \(-\). So what is the deal with the graph and node labels on types?

Tristan Crolard was the first to study bi-intuitionistic logic and type theory in the context of computer science. He proposed a multi-succedent sequent calculus called Subtractive Logic, but this sequent calculus has a defect. It requires the Dragalin restriction. This restriction requires that the premise of the right implication inference rule have only one formula on the right side of the sequent. Just as linear logicians observed this breaks cut-elimination! Luis Pinto and Tarmo Uustalu proposed a means of fixing this issue by first starting with LK+subtraction, and then adding a labeling system using graphs that semantically correspond to constraints on the set of Kripke models. This system they called labeled BINT. DTT can be seen as a simple type theory of a simplified version of labeled BINT called Dualized Intuitionistic Logic (DILL).

At this point I should quickly explain — intuitively — what BINT is, but I am not going to go into to much detail. See the references I have been linking to. My slogan is “BINT is intuitionistic logic with perfect duality.” Now “perfect” may be to strong of a word, but it gets my point across. BINT is a logic where for every logical connective of the logic its dual is also a logical connective of the logic. That is, BINT consists of true and false, conjunction and disjunction, and implication and its dual called subtraction or exclusion. Cecylia Rauszer was the first person to study BINT where she proposed an extended version of the Kripke Semantics for intuitionistic logic.

Lets consider the extended Kripke Semantics proposed by Cecylia. It may help with understanding subtraction. The interpretation of implication into a Kripke Model is defined as follows:

\( \interp{A \to B}_w = \forall w’.wRw’ \Rightarrow \interp{A}_{w’} \Rightarrow \interp{B}_{w’}\)

This states that \(A \to B\) holds in an arbitrary Kripke Model at world \(w\) if for any future world \(w’\), reachable from \(w\), and assuming \(A\) holds in the model at the future world \(w’\), and we can show that \(B\) holds at world \(w’\). The key observation to make regarding this interpretation is that it is modal, and talks about all future worlds. This modal aspect is what connects the semantics of modal logic S4 to Kripke semantics. Now to obtain the interpretation of subtraction in Kripke semantics is by simply taking the dual of the above definition:

\( \interp{A – B}_w = \exists w’.w’Rw \land \interp{A}_{w’} \land \lnot\interp{B}_{w’}\)

This shows that we move from talking about all future worlds to talking about the existence of a past world!

It is well-known that the simply-typed \(\lambda\)-calculus can be interpreted in cartesian-closed categories (CCC). Tristan Crolard showed that if one extends a CCC with co-products (which corresponds to disjunction) and co-exponentials or the left adjoint of the co-product (which corresponds to subtraction) then one obtains what he named a bi-[CCC], but even worse every bi-[CCC] is equivalent — up to isomorphism — to a preorder. That is, there is at most one morphism between any two objects of the bi-[CCC]. Therefore, this does not provide for a very good proof theory of BINT!

So I pose the question, does there actually exist a non-trivial proof theory for BINT? The answer may very well be no, but I have some ideas to explore before making that bold claim. That’s the topic of my next post. If the answer is indeed no, then that implies that we must move away from trying to come up with a satisfactory proof theory of bi-intuitionistic logic in the sense of a bi-[CCC]. Instead we must insist that bi-intuitionistic logic be split into two fragments: a positive fragment (or as Paul-Andrea Millies calls it the prover) and a negative fragment (denier). We can also consider the positive fragment being intuitionistic logic, and the negative fragment co-intuitionistic logic — actually this will be slightly different, because we cannot have an isomorphism between both sides. However, before going to that extreme there are a few ideas I want to explore first.

The first idea I explored was moving to linear logic and combining the idea of Crolard’s bi-[CCC]’s with the idea of Nick Benton’s mixed linear non-linear models of linear logic. I thought that by doing this we could obtain a non-trivial categorical model of bi-intuitionistic linear logic (linear BINT), and by the use of Girard’s translation a categorical model of BINT. However, this does not fix the problem, the embedding of BINT into linear BINT is degenerate by the same argument as Crolard’s. Let’s consider a diagram that may help situate our minds on this idea:

The diagram shows that we can view the model of linear BINT as the union of a symmetric monoidal closed category and its opposite. Then using Girard’s embedding we can obtain an adjunction between a bi-[CCC] (full BINT) and the previous union.

To use Crolard’s argument all we have to show is that by way of Girard’s translation that the co-Kleisli category is cartesian closed and the Kleisli category is co-cartesian co-closed. The former follows from Seely’s theorem in Gavin Bierman’s thesis (Proposition 17 on p. 156), and the latter follows by duality. We briefly show how to obtain the fact that intuitionistic implication is modeled by the right adjoint to the product functor:

The co-Kleisli category, \(\mathcal{C}_!\), is cartesian closed.
We have the following chain of isomorphisms:

\mathcal{C}_!(A \times B, C) & \cong & \mathcal{C}_!(!(A \times B), C) & (\text{Def.})\\
& \cong & \mathcal{C}_!(!A \times !B, C) & (\text{Seely’s axiom}\,n)\\
& \cong & \mathcal{C}_!(!A, !B -\negthinspace\negthinspace *\, C) & (\text{Closure.})\\
& \cong & \mathcal{C}_!(A, !B -\negthinspace\negthinspace * \,C) & (\text{Def.})\\

Note that the symbol \(-\negthinspace\negthinspace *\) is linear implication. I could not get the CMLL latex library to work with MathJax yet.

Now by duality we have the following result.

The Kleisli category, \(\mathcal{C}_?\), is co-cartesian co-closed.

Therefore, any categorical model of linear BINT with linear exponentials is degenerate just as Crolard established.

We could either remove the \(!\)-exponential (of-course) or the \(?\)-exponential (why-not) to obtain a model that looks like one of these:

Both of these recover the fact that we can embed either intuitionistic logic (the first diagram) into linear logic, or co-intuitionistc logic (the second diagram). However, both models are more restrictive than our initial idea, but this is to be expected, because something has to give or we would end up with a degenerated model after all.

We could also take the union of a CCC and a SMcC which would mean that the majority of the logic is non-linear, but subtraction is left linear. Now this is less appealing, because it is not as elegant as the other linear logic formulations (previous two diagrams) and it is no more expressive than the previous two diagrams especially the first of the two.

All of the approaches I have been discussing so far have been trying to keep the model — and also the logic — to be as collapsed as possible. I mentioned above that we could also separate the postive fragment from the negative fragment, and then relate them using a weak equivalence — an adjunction. Paul-Andrea Millies calls this a chirality. The term comes from chemistry and means something to the effect of “mirror symmetry.”

This is all I want to say in this post. I now conclude with some final remarks. My reflecting I think has paid off, as it usually does, and I believe I now have a novel idea for two different, but related non-trivial categories models of DIL and DTT. These models will indeed be collapsed, but constrained by an external structure. I will leave the reader with a hint about my latest idea. DIL and DTT may have a categorical model where the external structure is a coalgebra. Exciting I know!

Posted in category theory, Logic, metatheory, Semantics, Type Theory | Leave a comment

On to the next big adventure!

Over the course of the last five years I have had the honor of being one of Aaron Stump’s Ph.D. students. I am happy to report that I have successfully defended my dissertation, and have accepted a new academic position.

My dissertation is titled “The Semantics Analysis of Advanced Programming Languages” and contains most of my published and unpublished research over the course of my graduate experience. Here is the abstract:

We live in a time where computing devices power essential
systems of our society: our automobiles, our airplanes and even our
medical services. In these safety-critical systems, bugs do not
just cost money to fix; they have a potential to cause harm, even
death. Therefore, software correctness is of paramount importance.

Existing mainstream programming languages do not support
software verification as part of their design, but rely on testing,
and thus cannot completely rule out the possibility of bugs during
software development. To fix this problem we must reshape the very
foundation on which programming languages are based. Programming
languages must support the ability to verify the correctness of the
software developed in them, and this software verification must be
possible using the same language the software is developed in. In
the first half of this dissertation we introduce three new
programming languages: Freedom of Speech, Separation of Proof from
Program, and Dualized Type Theory. The Freedom of Speech language
separates a logical fragment from of a general recursive programming
language, but still allowing for the types of the logical fragment
to depend on general recursive programs while maintaining logical
consistency. Thus, obtaining the ability to verify properties of
general recursion programs. Separation of Proof from Program builds
on the Freedom of Speech language by relieving several restrictions,
and adding a number of extensions. Finally, Dualized Type Theory is
a terminating functional programming language rich in constructive
duality, and shows promise of being a logical foundation of
induction and coninduction.

These languages have the ability to verify properties of
software, but how can we trust this verification? To be able to put
our trust in these languages requires that the language be
rigorously and mathematically defined so that the programming
language itself can be studied as a mathematical object. Then we
must show one very important property, logical consistency of the
fragment of the programming language used to verify mathematical
properties of the software. In the second half of this dissertation
we introduce a well-known proof technique for showing logical
consistency called hereditary substitution. Hereditary substitution
shows promise of being less complex than existing proof techniques
like the Tait-Girard Reducibility method. However, we are unsure
which programming languages can be proved terminating using
hereditary substitution. Our contribution to this line of work is
the application of the hereditary substitution technique to
predicative polymorphic programming languages, and the first proof
of termination using hereditary substitution for a classical type

You can find my dissertation on my webpage here. The physical beast:

My defense was a lot of fun. My wife came which was the first time she has seen me give a talk. In addition, some past professors from my alma matar — Millikin University — virtually attended using +Google Hangouts. The title slide from my defense:

All in all I am extremely excited to have these:

I am now off on the next big adventure as an Assistant Professor of Computer Science in the Computer and Information Sciences Department at Georgia Regents University in August GA.

We packed all of our stuff into these big pods:

They are operated by ABF freight and I highly recommend them. They are the best part about movers, but are half the price. ABF delivers the pods to your house, and then you pack them up, finally when you are ready for them to be shipped to your destination you call ABF and have them come and get them. Now these pods are shipped like a package so they take 5 – 7 days to reach their desinition, so you have to make sure and plan accordingly. They arrived a day later in GA then we had atticipated so we had to wait around a bit. Using ABF to ship our stuff was a lot easier than having to drive a huge moving truck from IA to GA.

We are tired of living in an apartment so we decided to get a rental house:

We have been here for about a week and a half, and we absolutely love it. The neighborhood is very quite, and its only about 12 miles from campus.

I officially start work on August 11, and I am going to be teaching two courses: Programming Languages and Theory of Computation. The former is going to be devided into two halfs: a theory half and a practical half. The former is going to explore basing functional programming languages on typed \(\lambda\)-calculi and using abstract mathematics to be a better programmer. Then the latter half will primarly be about learning to implement prototype PLs in Haskell. The course project is an implementation of Girard/Reynold’s system F which is at the core of Haskell. Theory of computation is going to be a lot of fun as well. The main project for that is going to be a Turing machine simulator that will then be used to compile a prototype imparative PL to. Very exciting indeed!

That concludes this update. I am very excited about the future and being able to teach and continue doing research. It really is a dream come true.

Posted in General | 3 Comments

Diagram Schemes Need Not Be Posets

My future research will — as much as possible — be at least partially formalized in a proof assistant like Coq or Agda. I am a big Agda fan, so naturally, I will stick with it, but Coq definitely has some advantages and may be used for some projects.

I am graduating this summer and so I have been pondering what direction I want to go in my research, and I have decided that I want to try and contribute to areas where category theory plays a major role. So areas like categorical logic, and categorical semantics of programming languages. However, I am also interested in new areas of exploration using category theory. For example, Dusko Pavlovic’s initiative to study security using category theory seems very promising.

So to help with my future research I have been developing my own library for doing categorical logic in Agda. I call it the Lawvere categorical logic library or law for short. It can be found here: Law is based on total setoids because they allow for the definition of quotients and — more importantly for my up coming work — subsets in type theory. See the paper “Setoids in Type Theory” by Barthe et. al for more about that. Now I am well aware that there are a number of category theory libraries out there in the cloud, but rolling my own allowed me to learn a lot, and concentrate on what I am interested in. I am only going to formalize the category theory I need, and I wanted to learn how to use setoids. So this has many benefits for me.

I have a bunch of definitions implemented so far. See the README for a list. A week or so ago I wanted to formalize something small using law that would stress test some of my definitions. So I decided to formalize the notion of a diagram as a functor from an index category — the scheme — to some other category.

Now depending on what properties the index category has influences the type of diagram one has. According to Wikipedia a commutative diagram is a diagram with a poset for the scheme. Now a poset is a preorder with one additional property. A preorder (PO) is a category where for any two objects \( A \) and \( B \) there is at most one morphism between \( A \) and \( B \). This tells us that if a morphism exists then it is unique. Now a poset has the additional property which states that if there are two morphisms \( f \in Hom(A,B) \) and \(g \in Hom(B,A) \) then \( A = B \). Certainly, one can see that commutativity of a diagram is implied by the property of a PO, because if we have two or more compositions between two objects \( A \) and \( B \), then it must be the case that those compositions are equivalent.

I claim that posets are too strong in general for commutative diagrams. All we really need is a functor from a PO to some other category. To convince the reader I will now show that any commutative square in a category \( \mathcal{C} \) can be modeled by a functor from a PO. All of this post has been formalized in Agda, and I will do my best to point to the respective files throughout the remainder of this post.

First, we define a PO with exactly four objects and five non-identity morphisms. In fact, we can depict this category as a graph:

I left identity arrows out of the previous graph, but we do have them, and we name them \( id_n : i_n \to i_n \) for \(n \in \{1,2,3,4\} \). Now I call this category the \(4\text{PO}\). It’s definition in Agda can be found in the module of the same name here. It is easy to see that the above graph is a commutative square, because we only have one morphism between \(i_1\) and \(i_4\) and it is \(i_5\). Hence, \(f_1 ; f_3 = f_2 ; f_4 = f_5 \) — I will use diagram composition throughout this post e.g. given \(m_1 : A \to B\) and \(m_2 : B \to C\), then \(m_1;m_2 : A \to C\). The proof that \(4\text{PO}\) is in fact a PO is trivial, we can see it is from the graph above.

Now suppose we have a functor \(\mathsf{SQ} : 4\text{PO} \to \mathcal{C} \) for some category \(\mathcal{C}\), such that, \(\mathsf{SQ}(i_1) = A \), \(\mathsf{SQ}(i_2) = B \), \(\mathsf{SQ}(i_3) = D \), and \(\mathsf{SQ}(i_4) = C \). Furthermore, suppose \(\mathsf{SQ}(f_1) = g : A \to B \), \(\mathsf{SQ}(f_2) = h : A \to D \), \(\mathsf{SQ}(f_3) = j : B \to C \), and \(\mathsf{SQ}(f_4) = k : C \to D \). This tells us that we have the following picture in \(\mathcal{C}\):

Now we can show that the existence of the \(\mathsf{SQ}\) functor implies that the previous square commutes. The lemma in Agda is called Comm-Square-Commutes and can be found here.

First, notice that we know \( f_1;f_3 = f_2;f_4 \), by the PO property of \(4\text{PO}\). Thus, we know \( \mathsf{SQ}(f_1;f_3) = \mathsf{SQ}(f_2;f_4) \). Now consider the following:

\(\begin{array}\,g;h \\
\,\,\,= \mathsf{SQ}(f_1);\mathsf{SQ}(f_3) \\
\,\,\,= \mathsf{SQ}(f_1;f_3) \end{array} \)

\(\begin{array} \mathsf{SQ}(f_2;f_4) \\
\,\,\,= \mathsf{SQ}(f_2);\mathsf{SQ}(f_4) \\
\,\,\,= j ; k \end{array} \)
Therefore, \( g;h = j;k \). This result shows that in general posets are too strong and that POs are enough. In fact, we could use even weaker categories than POs for the index category. This is one of the benefits of working with diagrams defined as functors from index categories.

If anyone has any comments about my library or about diagrams as functors I would love to chat about it. Hopefully, I will have my post on LNL models done for next time.

Posted in Agda, category theory, computer science, formalized mathematics, Logic | Tagged , , , | 6 Comments

What have I been thinking about?

So Beeminder is about to go off on me if I do not post something soon.  Due to me graduating this semester, I have been very busy writing my thesis.  So blog posts have been put on the back burner for a bit, but I want to keep being active anyway.  I decided to post a snapshot of what I have been thinking about research wise.

The snapshot is in the form of a diagram.  This diagram illustrates all of the topics and how they fit together of my ongoing research program on developing a core logic where induction and coinduction can live harmoniously. A major property of this core logic is that it must have non-trivial categorical models.  This cannot be said about bi-intuitionistic logic currently.

Finally, here is what I am thinking about in a single diagram:



I am currently drafting a post explaining — in most likely too much detail — LNL-models. So stayed tuned for that.

Posted in category theory, computer science, Logic, metatheory, Semantics, Type Theory | Leave a comment

The Category of Deterministic Automata

One thing I have recently decided is that category theory provides a unique perspective of computation. Thus, it can be used throughout computer science. Many people in my field — theory of programming languages — already know this, or at least know that category theory can be used in the study and implementation of programming languages, but I claim more! I also claim that not every computer scientist knows that category theory can provide this alternate perspective. In a number of posts I am going to share some discoveries I have found in the literature that supports this claim. I am not sure on the amount of posts, but I plan to write a bunch. I really want to share some little nuggets of fun that I have been finding. In this post I want to define the category of deterministic automata. I learned of this from the wonderful reports titled “A Junction between Computer Science and Category Theory” by Goguen, Thatcher, Wagner, and Wright.


Throughout this post I often use the word “automaton” to mean “deterministic automaton.” Given a set of inputs called \(X\), a set of states \(S\), and a function \( \delta : X \times S \to S\), called the transition function, we can define the deterministic automata \(\mathcal{A} = \langle X,S,\delta \rangle\). The transition function \(\delta\) tells us how to move from one state to the next. We call \(\mathcal{A}\) a finite automata if and only if \(X\) and \(S\) are finite sets.

Lets consider a quick example. Suppose we have the following:

  • \(X = \{ A,B\} \)
  • \(S = \{ S_0,S_1,S_2\} \)
  • \(\delta(A, S_0) = S_1 \)
  • \(\delta(A, S_1) = S_1 \)
  • \(\delta(B, S_1) = S_2 \)
  • \(\delta(A, S_2) = S_1 \)
  • \(\delta(B, S_2) = S_2 \)

There is an equivalent graphical formulation of automaton. The graphical formulation of the previous definition is as follows:

We can see that the graphical formulation is a graph where the nodes are the states of the automaton, and the edges make up the definition of the transition function, and are labeled by the input symbols. The graphical formulation makes it really easy to determine reachability between states. For example, the state \(S_2\) is reachable from the state \(S_0\) using the input word \(AB\) by the following reasoning:

  • \(\delta(B, \delta(A, S_0)) = \delta(B, S_1) = S_2 \)

We can confirm this reasoning very easy using the graphical formulation, because we can simply follow the edge labeled \(A\) to the state \(S_1\), and then follow the edge labeled \(B\) to the state \(S_2\). Instead of transitions between edges one symbol at a time is it possible to extend \( \delta \) so that we can transition directly using groupings of symbols — these are usually called words?

It turns out that we can, and I know of two different ways this can be done. Perhaps there are more, but I want to give an extension of \( \delta \) that is somewhat different. First we must define the monoid of all functions between states, that is functions from \(S\) to \(S\). Suppose that we have two functions \(f : S \to S\) and \(g : S \to S\), then we can define a new function called \(f ; g : S \to S\). So suppose \(s \in S\), then \((f ; g)(s) = g(f(s))\). We can define a second function called the identity function \(id : S \to S\) by \(id(s) = s\), and we can see that \(f ; id = id ; f = f\) for any function \(f : S \to S\). This data defines the monoid \([S,S]\) of all functions between states, and I leave checking the axioms to the reader.

Suppose \(A \in X\), then notice that we can define a function \(\delta_A : S \to S\) as \(\delta_A(s) = \delta(A,s)\) for any \(s \in S\). Thus, \(\delta_A \in [S,S]\). Taking this one step further we can define the function \(\delta_X : X \to [S,S]\) as \(\delta_X(A) = \delta_A\). Based on the definition of an automaton we know that \(X\) is a set of symbols, but we can think of this as an alphabet, then we can use this alphabet as the carrier set of the monoid \(X^*\) which is the free monoid of all strings generated by juxtaposition of the elements of \(X\) where the empty string is the identity of the monoid. Since we can take a set of symbols and generate a free monoid we should be able to take a function from the set of symbols to a monoid and extend it into a monoid homomorphism from the free monoid generated from the domain, to the range monoid. This means we can take the function \(\delta_X : X \to [S,S] \) and extend it into a function \(\delta^*_X : X^* \to [S,S]\), where

  • \(\delta^*_X(A_1 \cdots A_n) = \delta^*_X(A_1);\cdots ;\delta^*_X(A_n) = \delta_X(A_1);\cdots ;\delta_X(A_n) = \delta_{A_1};\cdots ;\delta_{A_n}\)

The previous equation is the defining characteristic of a monoid homomorphism. This extension is called the universal property of \(X^*\). We leave the details of the proof that the extension is indeed unique to the reader. See Proposition 1.6 of the paper I cited above. Finally, we can define the extension of \(\delta\) called \(\delta^+\) by \(\delta^+(w,s) = (\delta^*_X(w))(s)\).
Recall our example from above we know

  • \(\delta(B, \delta(A, S_0)) = \delta(B, S_1) = S_2 \)

Now using the extension we can now see that

  • \(\begin{array}{llll}
    \delta^+(AB,S_0) & = (\delta^*_X(AB))(S_0) \\
    & = (\delta^*_X(A);\delta^*_X(B))(S_0) \\
    & = (\delta_A;\delta_B)(S_0)\\
    & = \delta_B(\delta_A(S_0))\\
    & = \delta(B, \delta(A, S_0) \\
    & = S_2

Automaton Homomorphisms

In this section I will define what I call automaton homomorphisms, but this is nonstandard terminology, but I think fits best. These are simple little mappings from one automaton to another with an additional invariant.

Suppose we have automatons \(\mathcal{A} = \langle X_{\mathcal{A}},S_{\mathcal{A}}, \delta_{\mathcal{A}}\rangle \) and \(\mathcal{B} = \langle X_{\mathcal{B}},S_{\mathcal{B}}, \delta_{\mathcal{B}} \rangle \). Then how would we map \(\mathcal{A}\) into \(\mathcal{B}\)? First, we would have to map \(X_{\mathcal{A}}\) into \(X_{\mathcal{B}}\) and map \(S_{\mathcal{A}}\) into \(S_{\mathcal{B}}\). Then using these mappings we would have to map \(\delta_{\mathcal{A}}\) into \(\delta_{\mathcal{B}}\), but we have to be careful, because this final mapping should not break transitions when moving from \(\mathcal{A}\) to \(\mathcal{B}\), that is if we can transition from one state to another in \(\mathcal{A}\), then the states these two map to in \(\mathcal{B}\) should also have a transition between them.

This intuition guides us to the definition of an automaton homomorphism between \(\mathcal{A}\) and \(\mathcal{B}\). These are pairs \((\phi, \psi)\) where \(\phi : X_{\mathcal{A}} \to X_{\mathcal{B}}\) and \(\psi : S_{\mathcal{A}} \to S_{\mathcal{B}}\). Furthermore, the following diagram must commute:

The definition of \(\phi \times \psi\) is rather simple, \((\phi \times \psi)(A,s) = (\phi(A),\psi(s))\). The previous diagram simply guarantees our invariant holds.

Automaton are Categories

Now I would like to show that automaton are in fact categories. This is a straightforward application of generating the free category from a graph. Suppose we have an automaton \(\mathcal{A} = \langle X,S,\delta\rangle \). Then we define the category \(\mathsf{Tr}_{\mathcal{A}}\) as follows:

  • Objects: \(|\mathsf{Tr}_{\mathcal{A}}|_0 = S\)
  • Morphisms: \(|\mathsf{Tr}_{\mathcal{A}}|_1 = \{w : S_1 \to S_2 \mid w \in X^* \text{ and } \delta^+(w,S_1) = S_2 \} \)

Now we have to check the axioms of a category. Clearly, the identity is \(\lambda : S_0 \to S_0\) for any state \(S_0 \in S \) where \(\lambda \in X^*\) is the empty string. Suppose \(w_1 : S_1 \to S_2 \in |\mathsf{Tr}_{\mathcal{A}}|_1\) and \(w_2 : S_2 \to S_3 \in |\mathsf{Tr}_{\mathcal{A}}|_1\). Then we know \(w_{1}w_{2} \in X^*\) and \(\delta^+(w_{1}w_{2},S_1) = S_3\). Therefore, \(w_{1}w_{2} : S_1 \to S_3 \in |\mathsf{Tr}_{\mathcal{A}}|_1\). So we can define composition as \(w_1 ; w_2 = w_{1}w_{2} : S_1 \to S_3\). The remainder of the axioms are easily verified. This construction is unique up to an isomorphism using the universal mapping property of the free category generated by the graph of the automaton.

The Category of Deterministic Automaton

Finally, we can now take everything we have defined so far, and put it all together to form the category of deterministic automaton \(\mathsf{DAut}\). The objects are all deterministic automaton. The morphisms are all automaton homomorphisms between deterministic automaton. Certainly there is an identity, for any automaton \(\mathcal{A}\) there is an identity homomorphism from \(\mathcal{A}\) to \(\mathcal{A}\) defined as follows \((\mathsf{id}_X,\mathsf{id}_S)(B,S_0) = (B,S_0)\), and the invariant holds:

Now we must define composition. Suppose \((\phi_1,\psi_1) : \mathcal{A} \to \mathcal{B}\) and \((\phi_2,\psi_2) : \mathcal{B} \to \mathcal{C}\) are automaton homomorphisms. Then their composition \((\phi_1,\psi_1) ; (\phi_2,\psi_2)\) is defined as \((\phi_1,\psi_1) ; (\phi_2,\psi_2) = (\phi_1;\phi_2,\psi_1;\psi_2)\). We must check that this definition of composition is an automaton homomorphisms, hence we must verify the invariant. We know by assumption that the invariant holds for \((\phi_1,\psi_1)\) and

Using these diagrams we can show that the following diagram commutes:

The following equational reasoning shows the previous diagram commutes:
\delta_{\mathcal{A}};\psi_1;\psi_2 & = (\delta_{\mathcal{A}};\psi_1);\psi_2 & \text{(associativity)}\\
& = ((\phi_1 \times \psi_1);\delta_{\mathcal{B}});\psi_2 & \text{(top diagram)}\\
& = (\phi_1 \times \psi_1);(\delta_{\mathcal{B}};\psi_2) & \text{(associativity)}\\
& = (\phi_1 \times \psi_1);((\phi_2 \times \psi_2);\delta_{\mathcal{C}}) & \text{(bottom diagram)}\\
& = (\phi_1 \times \psi_1);(\phi_2 \times \psi_2);\delta_{\mathcal{C}} & \text{(associativity)}\\
Thus, composition as we have defined is indeed an automaton homomorphism. I leave checking the remainder of the axioms of a category as an exercise.

This post talks specifically about deterministic automaton, but we can form the category of nondeterministic automaton as well. In fact, the definition is exactly the same. Then using these we can form the category \(\mathsf{Aut}\) of all automaton.

Fun Facts

Here I list some fun facts that can be deduced from the categories we have looked at in this post.

  • To check that a word \(w\) is accepted by an automaton \(\mathcal{A}\), one only needs to check that there exists two states \(S_0\) and \(S_1\) such that \(w \in \mathsf{TR}_{\mathcal{A}}(S_0,S_1)\).
  • An automaton \(\mathcal{A}\) is strongly connected if and only if \(\mathsf{TR}_{\mathcal{A}}(S_0,S_1) \not= \emptyset\) for all states \(S_0,S_1 \in S\).
  • An automaton \(\mathcal{A}\) is reachable from the state \(S_0\) if and only if \(\mathsf{TR}_{\mathcal{A}}(S_0,S_1) \not= \emptyset\) for all states \(S_1 \in S\).
  • All of the sets \(\mathsf{TR}_{\mathcal{A}}(S_0,S_1)\) are the sets definable by the automaton \(\mathcal{A} \). Furthermore, these correspond to subsets — including their finite unions — of \(X^*\).
Posted in category theory, computer science | Tagged , , | 2 Comments