1 Introduction
Secondorder transitiveclosure logic, , is an expressive declarative language that captures the complexity class [21]. It extends secondorder logic with a transitive closure operator over relations of relations, i.e., over super relations among relational structures. The super relations are defined by means of secondorder logic formulae with free relation variables. Already its monadic fragment, , allows the expression of complete problems in a natural and elegant manner. Consider, for instance, the well known Hamiltonian cycle query over the standard vocabulary of graphs, which is not expressible in monadic secondorder logic [13]. A graph has a Hamiltonian cycle if the following holds:

There is a relation such that iff , , and .

The tuple is in the transitive closure of , for some s.t. .
In the language of this can be written as follows:
where . Even some wellknown complete problems such as deciding whether a given quantified Boolean formula QBF is valid [27] can be expressed in MSO(TC) (see Section 3).
In general, SO(TC) offers an attractive framework for expressing properties in terms of declaratively specified computations at a high level of abstraction. There are many examples of graph computation problems that involve complex conditions such as graph colouring [4], topological subgraph discovery [19], recognition of hypercube graphs [18], and many others (see [9, 16, 17]). Such graph algorithms are difficult to specify, even by means of rigorous methods such as Abstract State Machines (ASMs) [10], B [2] or EventB [3], because the algorithms require the definition of characterising conditions for particular subgraphs that lead to expressions beyond firstorder logic. Therefore, for the sake of easily comprehensible and at the same time fully formal highlevel specifications, it is reasonable to explore languages such as SO(TC). Let us see an example that further supports these observations.
Selfsimilarity of complex networks [36] (aka scale invariance) has practical applications in diverse areas such as the worldwide web [14], social networks [20], and biological networks [32]. Given a network represented as a finite graph , it is relevant to determine whether can be built starting from some graph pattern by recursively replacing nodes in the pattern by new, “smaller scale”, copies of . If this holds, then we say that is selfsimilar.
Formally, a graph is selfsimilar w.r.t. a graph pattern of size , if there is a sequence of graphs such that , and, for every pair of consecutive graphs in the sequence, there is a partition of the set of nodes of which satisfies the following:

For every , the subgraph induced by in is isomorphic to .

There is a graph isomorphic to with set of nodes for some and set of edges

For very , the closed neighborhoods and of and in , respectively, are isomorphic.
It is straightforward to write this definition of selfsimilarity in SO(TC), for we can clearly write a secondorder logic formula which defines such a super relation on graphs and then simply check whether the pair of graphs is in the transitive closure of .
Highly expressive query languages are gaining relevance in areas such as knowledge representation (KR), rigorous methods and provers. There are several examples of highly expressive query languages related to applications in KR. See for instance the monadically defined queries in [35], the Monadic Disjunctive SNP queries in [5] or the guarded queries in [11]. All of them can be considered fragments of Datalog. Regarding rigorous methods, the TLA language [28] is able to deal with higherorder formulations, and tools such as the TLA Proof System^{1}^{1}1https://tla.msrinria.inria.fr/tlaps and the TLA ModelChecker (TLC)^{2}^{2}2https://lamport.azurewebsites.net/tla/tlc.html can handle them (provided a finite universe of values for TLC). Provers such as Coq^{3}^{3}3https://coq.inria.fr/ and Isabelle^{4}^{4}4https://isabelle.in.tum.de/ can already handle some highorder expression. Moreover, the success with solvers for the Boolean satisfiability problem (SAT) has encouraged researchers to target larger classes of problems, including complete problems, such as satisfiability of Quantified Boolean formulas (QBF). Note the competitive evaluations of QBF solvers (QBFEVAL) held in 2016 and 2017 and recent publications on QBF solvers such as [8, 31, 22] among several others.
We thus think it is timely to study which features of highly expressive query languages affect their expressive power. In this sense, provides a good theoretical base since, apart from been a highly expressive query language (recall that it captures ), it enables natural and precise highlevel definitions of complex practical problems, mainly due to its ability to express properties in terms of declaratively specified computations. While secondorder logic extended with the standard partial fixedpoint operator, as well as firstorder logic closed under taking partial fixedpoints and under an operator for nondeterministic choice, also capture the class of queries over arbitrary finite structure [33], relevant computation problems such as that in Example 1 are clearly more difficult to specify in these logics. The same applies to the extension of firstorder logic with the partial fixedpoint operator, which is furthermore subsumed by since it captures only on the class of ordered finite structures [1]. Note that coupled with hereditary finite sets and set terms, could be considered as a kind of declarative version of Blass, Gurevich, and Shelah (BGS) model of abstract state machine [7], which is a powerful language in which all computable queries to relational databases can be expressed [6].
Our results can be summarized as follows.

We investigate to what extent universal quantification and negation are important to the expressive power of . Specifically, we consider the case where TCoperators are applied only to secondorder variables. Of course, a secondorder variable can simulate a firstorder variable, since we can express already in firstorder logic (FO) that a set is a singleton. This, however, requires universal quantification.
We define a “purely existential” fragment of , , as the fragment without universal quantifiers and in which operators occur only positively and bind only tuples of relation variables. We show that the expressive power of this fragment collapses to that of existential FO.
For SO alone, this collapse is rather obvious and was already remarked by Rosen in the introduction of his paper [34]. Our result generalizes this collapse to include TC operators, where it is no longer obvious.

We investigate the expressive power of the monadic fragment, . On strings, this logic is equivalent to the complexity class NLIN. Already on unordered structures, however, we show that can express counting terms and numeric predicates in . In particular, can express queries not expressible in the fixpoint logic FO(LFP). We also discuss the fascinating open question whether the converse holds as well.

We compare the expressive power of to that of orderinvariant . Specifically, we show that can express queries not expressible in orderinvariant ; over monadic vocabularies, we show that orderinvariant MSO is subsumed by . Again, what happens over higherarity relations is an interesting open question.
This paper is organized as follows. In Section 2 definitions and basic notions related to are given. In Section 3 the complexity of model checking is studied. Section 4 is dedicated to establishing the collapse of to existential firstorder logic. Sections 5 and 6 concentrate on the relationships between and the counting extension and orderinvariant , respectively. We conclude with a discussion of open questions in Section 7.
2 Preliminaries
We assume that the reader is familiar with finite model theory, see e.g., [15] for a good reference. For a tuple of elements, we denote by the th element of the tuple. We recall from the literature, the syntax and semantics of firstorder () and secondorder () logic, as well as their extensions with the transitive closure operator (). We assume a sufficient supply of firstorder and secondorder variables. The natural number , is the arity of the secondorder variable . By variable, we mean either a firstorder or secondorder variable. Variables and have the same sort if either both and are firstorder variables, or both are secondorder variables of the same arity. Tuples and of variables have the same sort, if the lengths of and are the same and, for each , the sort of is the same as the sort of .
The formulas of are defined by the following grammar:
where and are secondorder variables, , are firstorder variables, and are disjoint tuples of variables of the same sort, and and are also tuples of variables of that same sort (but not necessarily disjoint). The set of free variables of a formula , denoted by is defined as usual. For the operator, we define
Above in the right side, in order to avoid cumbersome notation, we use , , and to denote the sets of variables occurring in the tuples.
A vocabulary is a finite set of variables. A (finite) structure over a vocabulary is a pair , where is a finite nonempty set called the domain of , and is an interpretation of on . By this we mean that whenever is a firstorder variable, then , and whenever is a secondorder variable of arity , then . In this article, structures are always finite. We denote also by . For a variable and a suitable value for that variable, denotes the structure over equal to except that is mapped to . We extend the notation also to tuples of variables and values, , in the obvious manner. We say that a vocabulary is appropriate for a formula if .
Let be a structure over and an formula such that is appropriate for . The satisfaction of by , denoted by , is defined as follows. We only give the cases for secondorder quantifiers and transitive closure operator; the remaining cases are defined as usual.

For secondorder variable : iff , for some .

For the case of the operator, consider a formula of the form and let . Define to be the following set
and consider the binary relation on defined as follows:
We set to hold if belongs to the transitive closure of . Recall that, for a binary relation on any set , the transitive closure of is defined by
By we denote the variant of in which the quantification of above is restricted to natural numbers . That is, consists of pairs such that is reachable from by in at most steps. Moreover, by and we denote the syntactic restrictions of and of the form
where , , , are tuples of secondorder variables (i.e. without firstorder variables). The logic then denotes the extension of secondorder logic with operator. Analogously, by , we denote the extension of firstorder logic with applications of such transitiveclosure operators that bind only firstorder variables.^{5}^{5}5In the literature is often denoted by .
3 Complexity of MSO(TC)
The descriptive complexity of different logics with the transitive closure operator has been thoroughly studied by Immerman. Let denote the fragment of in which secondorder variables are all of arity . [[23, 24]]

On finite ordered structures, firstorder transitiveclosure logic captures nondeterministic logarithmic space .

On strings (word structures), captures the complexity class .
See also the discussion in the conclusion section.
By the above theorem, captures nondeterministic linear space over strings. Deciding whether a given quantified Boolean formula is valid (QBF) is a wellknown complete problem [27]. Observe that there are complete problems already in ; in fact QBF is such a problem. Thus, we can conclude the following. The inclusion in is clear.
Proposition .
Data complexity of is complete.
We next turn to combined complexity of model checking. By the above proposition, this is at least hard. However, the straightforward algorithm for model checking clearly has polynomialspace combined complexity. We thus conclude:
Proposition .
Combined complexity of is complete.
For combined complexity, we can actually sharpen the hardness; already a very simple fragment of is complete.
Specifically, we give a reduction from the corridor tiling problem, which is a wellknown complete problem. Instance of the corridor tiling problem is a tuple , where is a positive natural number, , for some , is a finite set of tiles, are horizontal and vertical constraints, and are tuples of tiles from . A corridor tiling for is a function , for some , such that

and ,

, for and ,

, for and .
The corridor tiling problem is the following complete decision problem [12]:
Input: An instance of the corridor tiling problem.
Output: Does there exist a corridor tiling for ?
Let monadic denote the fragment of of the form , where is a formula of universal firstorder logic.
Combined complexity of model checking for monadic is complete.
Proof.
Inclusion to follows from the corresponding result for . In order to prove hardness, we give a reduction from corridor tiling. Let be an instance of the corridor tiling problem and set . Let be a vocabulary, where is a binary secondorder variable and are monadic secondorder variables. Let denote the structure over such that , is the canonical successor relation on , and, for each , and . Define
where and tuples of distinct monadic secondorder variables not in . We then define We claim that if and only if there exists a corridor tiling for , from which the claim follows. ∎
4 Existential positive SO(2TC) collapses to EFO
Let denote the syntactic fragment of in which existential quantifiers and the operator occur only positively, that is, in scope of even number of negations. In this section, we show that the expressive power of collapses to that of existential firstorder logic . In this section, operators are applied only to tuples of secondorder variables. As already discussed in the introduction, this restriction is vital: the formula expresses reachability in directed graphs, which is not definable even in the full firstorder logic.
To facilitate our proofs we start by introducing some helpful terminology.
Let and be tuples of the same length and a set of natural numbers. The difference of the tuples and is defined as follows
The similarity of tuples and is defined as follows
We say that the tuples and are pairwise compatible if the sets and are disjoint. The tuples and are pairwise compatible outside if and are disjoint. The tuples and are pairwise compatible if and are pairwise compatible and .
Let be vocabularies, a structure, and a tuple of elements of . The (quantifierfree) type of in is the set of those quantifier free formulae such that .
The following lemma establishes that operators that are applied to formulas can be equivalently expressed by the finite operator. Every formula of the form , where and , , , are tuples of secondorder variables, is equivalent with the formula , for some .
Proof.
Let , where is quantifierfree, and let denote the vocabulary of . We will show that for large enough and for all structures
From here on we consider and fixed; especially, by a constant, we mean a number that is independent of the model ; that is, it may depend on and .
It suffices to show the lefttoright direction as the converse direction holds trivially for all . Assume that . By the semantics of there exists a natural number and tuples of relations on such that , , and
(1) 
For each , let and let denote the vocabulary of . By the semantics of the existential quantifier, (1) is equivalent to saying that
(2) 
for some tuples from . We will prove the following claim.
Claim: There exists an index set and mutually pairwise compatible sequences in that have a common type provided that is a large enough constant.
Proof of the claim: Let denote the longest (not necessarily consecutive) subsequence of that have a common type. Since there are only finitely many types, can be made as large as needed by making a large enough constant.
We will next show that there exists mutually pairwise compatible sequences in for some (provided that is large enough). Set . In the construction below we maintain the following properties for :

For each and for each tuple and in it holds that .

The length of is as long a constant as we want it to be.
For , let be a maximal collection (in length) mutually pairwise compatible sequences from . If we are done. Otherwise note that, since each is an tuple, the number of different points that may occur in is . By an inductive argument we may assume that the length of is as large a constant as we want, and thus we may conclude that there exists an index and an element such that there are as many as we want tuples in such that . Set and let be the sequence of exactly those such that . Notice that the length of is as large a constant as we want it to be.
Finally, the case . Note that and is a sequence of tuples; in fact all tuples in are identical. Thus, if the length of is at least , the first sequences of constitute a mutually pairwise compatible sequence of length . It is now straightforward but tedious to check how large has to be so that the length of is at least ; thus the claim holds. ∎
Now let , , be mutually pairwise compatible sequences from with a common type provided by the Claim. Let be an index such that and are pairwise compatible outside and . It is straightforward to check that such a always exists, for if and are not pairwise compatible outside or , there exists some indices such that , and for each such the value of the related has to be unique as are mutually pairwise compatible. Now must exist since the length of is while the length of is only .
Consider the models and and recall that We claim that there exists a sequence of relations on such that
(3) 
and thus that and . From this the claim of the theorem follows for .
It now suffices to show that such a exists. The idea is that looks exactly like with respect to points in and like with respect to points . Formally is defined as follows. For every relation and tuple

if is completely included in neither nor then we set ,

if is completely included in then we set iff ,

if is completely included in then we set iff .
Note that if is completely included in both and then there exists indices such that, for , . The former equality follows, with indices in , since and are pairwise compatible outside and . The latter equality follows since and are pairwise compatible. Since and have the same type iff , for all , and thus is welldefined. It is now immediate that (3) holds. ∎
For every formula of vocabulary of the form or , where and , , , are tuples of relation variables, there exists an equivalent formula of vocabulary .
Proof.
Consider first the formula . Define and let be the number of occurrences of in . The idea behind our translation is that the quantification of can be equivalently replaced by a quantification of an ary relation of size ; this can be then expressed in by quantifying many tuples (content of the finite relation).
Let denote the formula obtained from by replacing every occurrence of the relation variable of the form in by the formula . Define where, for each i, is a shorthand for and is the formula obtained from by substituting each occurrence of the relation variable of the form in by It is straightforward to check that is an formula of vocabulary equivalent with .
Consider then the formula . In order to simplify the presentation, we stipulate that and are of length one, that is, variables and , respectively; the generalisation of the proof for arbitrary tuples of secondorder variables is straightforward. By Lemma 4, we obtain such that and are equivalent.
The following formulas are defined via substitution; by we denote the formula obtained from by substituting each occurrence of the symbol by the symbol .

and , for ,

and , for .
Let denote the following formula of existential secondorder logic
It is immediate that and are equivalent. Note that is of the form , where is an formula. By repetitively applying the first case of this lemma to subformulas of , we eventually obtain an equivalent formula over as required. ∎
The expressive powers of and coincide.
Proof.
Every formula is also an formula, and thus it suffices to establish the converse direction. The proof proceeds via induction on the combined nesting depth of secondorder quantifiers and operators. For every formula of the form or , where , let denote the equivalent formula obtained from Lemma 4.
Let be an arbitrary formula with combined nesting depth of secondorder quantifiers and operators. Let be the formula obtained from by simultaneously substituting each of its subformula of the form or , where , by its translation . Clearly and are equivalent, and the combined nesting depth of secondorder quantifiers and operators in is . Thus, by induction, the claim follows, as for the formula is already in . ∎
5 MSO(TC) and counting
We define a counting extension of and show that the extension does not add expressive power to the logic. In this way, we demonstrate that quite a bit of queries involving counting can be expressed already in .
5.1 Syntax and semantics of CMSO(TC)
We assume a sufficient supply of counter variables or simply counters, which are a new sort of variables. We use the Greek letters and (with subscripts) to denote counter variables. The notion of a vocabulary is extended so that it may also contain counters. A structure over a vocabulary is defined to be a pair as before, where now also maps the counters in to elements of , where is the cardinality of .
We also assume a sufficient supply of numeric predicates. Intuitively numeric predicates are relations over natural numbers such as the tables of multiplication and addition. Technically, we use an approach similar to generalised quantifiers; a ary numeric predicate is a class of tuples of natural numbers. For a numeric predicate , we use as a symbol referring to the predicate. For simplicity, we often call also numeric predicate. Note that when evaluating a ary numeric predicate on a finite structure , we let the numeric predicate access also the cardinality of the structure in question, and thus consists of tuples and not tuples. This convention allows us, for example, to regard the modular sum , where refers to the cardinality of the structure, as a ary numeric predicate.
We consider only those numeric predicates which can be decided in . Since, on finite ordered structures, firstorder transitive closure logic captures , this boils down to being definable in firstorder transitive closure logic when the counter variables are interpreted as points in an ordered structure representing an initial segment of natural numbers (see Definition 5.1 and Proposition 5.1 below for precise formulations).
The syntax of extends the syntax of as follows:

Let be a formula, a counter, and a firstorder variable. Then is also a formula. The set of its free variables is defined to be .

If is a formula and a counter then also is a formula with set of free variables .

Let , …, be counters and let be a ary numeric predicate. Then
Comments
There are no comments yet.