YARP workflow
The YARP workflow for automatically exploring a reaction network on silica-supported single site Ga catalysts consists of four components: Ga-silica cluster model construction, graph-based product enumeration, transition state localization and characterization, and reaction cycle investigation (Fig. 1). In brief, a Si8O12(OH)8 cluster is adapted from Ugliengo et al.38. A Ga-ethyl site is then created by substitution of a Si-OH moiety with a Ga atom and adding an ethyl group to the Ga site, which serves as the starting reactant (Fig. 1a). Once the cluster model is built, graph-based product enumeration is recursively applied to a subset of reactive atoms, involving the gallium, carbon, and hydrogen atoms attached to carbon in the cluster model (shown as pink, gray, and white spheres in Fig. 1b) to generate potential products. Product enumeration is performed using generic elementary reaction steps (ERS) consisting of bonding changes in the reactants that can be applied combinatorially to define closed reaction spaces. After the ERS-based product enumeration, transition states of the enumerated reactions are characterized using the same procedure as our earlier study33, consisting of reaction geometry initialization and growing string method (GSM)39 search performed with the GFN2-xTB semi-empirical model chemistry40, followed by Berny optimization and intrinsic reaction coordinate (IRC) calculations performed using DFT (Fig. 1c). All of the kinetically-relevant products explored in this procedure serve as inputs for further reaction exploration until the discovery of new reactions is exhausted. In the end, important reaction pathways and catalytic cycles are analyzed in detail based on the reaction mechanisms discovered during the exploration (Fig. 1d). Detailed descriptions of each component are provided in the Methods section.

a A cluster model of a Ga3+ single site is built from a conventional periodic model. b Possible products are recursively enumerated from reactants/intermediates produced from elementary reaction steps on the cluster model. c Transition state localization and characterization are applied to each enumerated reaction. d Once the network exploration recursion terminates, detailed reaction mechanisms and relevant reaction cycles are summarized.
The complete reaction network involving Ga3+
The overall reaction network that was generated by YARP for ethylene oligomerization on silica-supported single site Ga3+ is shown in Fig. 2. Network exploration was initialized with the Ga-ethyl species (node 0 in Fig. 2), which has been proposed as a key intermediate in the Cossee-Arlman ethylene oligomerization cycle37. After a single step of reaction enumeration and TS characterization, Ga-n-butyl, Ga-vinyl + ethane, and Ga-hydride + 1-butene, were identified as intended products of reactions between Ga-ethyl and ethylene. The free energies of activation (ΔG†) of forming Ga-n-butyl, Ga-vinyl, and Ga-hydride are 44.1, 59.8, and 93.5 kcal/mol, respectively. Based on its high activation energy of formation, YARP excluded Ga-hydride from further exploration, whereas Ga-n-butyl and Ga-vinyl were included as active nodes for further reaction exploration. The high activation energy of β-hydrogen elimination to form Ga-hydride has also been observed in previous studies using conventional periodic DFT analysis36,37. The second step of exploration identifies Ga-n-butenyl (from Ga-vinyl, ΔG† = 53.2 kcal/mol), acetylene (formed with Ga-ethyl from Ga-vinyl, ΔG† = 51.4 kcal/mol), Ga-hexyl (from Ga-butyl, ΔG† = 61.3 kcal/mol), 1-butene (formed with Ga-ethyl from Ga-butyl, ΔG† = 36.0 kcal/mol) and butane (formed with Ga-vinyl from Ga-butyl, ΔG† = 76.4 kcal/mol) as intended products. Notably, the lowest barrier step yielding 1-butene constitutes a rediscovery by the algorithm of the classic Cossee-Arlman mechanism that has previously been studied as the likely pathway for major product formation in this system. Based on the activation energies of the reactions at this iteration, Ga-n-butenyl was included as a new active node for further exploration (node 7), Ga-n-hexyl was classified as a terminal node (node 13) due to its size (see Methods for termination criteria), and 1-butene was added to the free-olefin list as a candidate for further reactions with the active nodes, Ga-ethyl (node 0) and Ga-vinyl (node 1). YARP recursively explored the reaction space via the same approach that was employed in the first and second iteration until all reactions within the prescribed constraints had been explored. All reactions explored with ΔG† < 80 kcal/mol are presented in Fig. 2, and detailed geometries of each node can be found in the SI.

The edge colors reflect the activation free energy (ΔG†) of each pathway as a measure of kinetic accessibility. Intermediate types are classified based on the alkyl and alkenyl attached to Ga and are denoted by different node colors.
Three key reaction types occurring on Ga3+
Three distinct types of reactions were discovered during the network exploration that are distinguished by their reactions with the adsorbed carbon species. All instances of each class exhibit ΔG† < 70 kcal/mol. The first type is responsible for lengthening (or breaking, in the case of the reverse reaction) the carbon backbone (Type I in Fig. 3). The TS of the Type I reaction involves a “C=C” moiety bonding to the catalyst to form a four-coordinated Ga intermediate that precedes bond formation with an adsorbed alkyl species. The second type of reaction is β-hydride transfer that liberates an oligomer and closes an oligomerization cycle (Type II in Fig. 3). In the TS of this reaction type, the β-hydrogen of the adsorbed alkyl species transfers to an incoming olefin, which binds to the Ga center and becomes a new adsorbate. An oligomerization cycle can also be completed by a β-hydride elimination step to form Ga-hydride, but YARP predicts a much higher activation energy for this pathway (see Supporting Information for details). The facile β-hydride transfer step on Ga is a fundamentally different reaction channel from those occurring on traditional transition metal catalytic sites, where the hydrogen being transferred is not interacting with the metal center41. The third type of reaction produces an alkane, leaving a hydrogen-deficient adsorbed species, like Ga-vinyl (Type III in Fig. 3). The TS of the Type III reaction resembles that of Type II, except that the hydrogen transfers to the α-carbon. Alkane formation has been reported in multiple olefin oligomerization experiments36,41,42,43, which may be explained by moderate barrier Type III pathways. Further, we hypothesize that the products of type III reactions may undergo additional type I and type II steps. The combination of type I-III reactions may eventually liberate alkynes and aromatics that are commonly considered to be coke precursors44,45 (region e in Fig. 4). For example, a low barrier (~42 kcal/mol) Ga-catalyzed conversion of ethyne to benzene is compared with a non-catalyzed conversion in Fig. S3.

(I) olefin insertion; (II) β-hydride transfer; (III) α-hydride transfer. Ri refers to hydrogen, methyl, and ethyl groups. The presented TS structures represent the simplest examples of I-III reactions.

a Oligomerization pathway to 1-butene. Isomerization pathways forming (b) 2-butene and c isobutene. d Cracking pathway for propylene formation. e An example of a pathway forming alkanes and hydrogen-deficient products starting from Ga-ethyl. Similar alkane formation cycles can also occur for species ②, ④, ⑥, and ⑩.
In addition to participating in the expected Cossee-Arlman oligomerization cycle, the elementary reaction types described above can contribute to several catalytic cycles for olefin isomerization and chain cracking (Fig. 4). In particular, following the formation of 1-butene and the recovery of the Ga-ethyl intermediate (species ⑤ in Fig. 4), where a Cossee-Arlman oligomerization cycle is nearly complete, the 1-butene molecule can be re-adsorbed with a simple rotation to react with Ga-ethyl through another β-hydride transfer (type II), producing Ga-2-butyl (species ⑥). This newly reported intermediate can undergo a facile type II reaction, forming cis- or trans-2-butene (only cis-2-butene formation is considered here, species ⑭). Butenes have been previously detected in experiments at 523 K and 1 atm, but distinguishing between cis and trans isomers is challenging due to their similar retention times in gas chromatography36. In an alternative pathway, Ga-2-butyl can undergo additional type I and II reactions to form Ga-methyl with physisorbed propylene (species ⑧). In addition, there can be another re-adsorption step of propylene on Ga-methyl, resulting in a Ga-isobutyl species (species ⑮), which eventually leads to isobutene (species ⑰). Throughout the isomerization and cracking pathways, the type III step can occur on each Ga-alkyl species, including Ga-ethyl, Ga-propyl, and Ga-butyl species. For example, a plausible pathway involving the type III reaction is outlined in the green circle of Fig. 4, where the resulting Ga-vinyl intermediate undergoes additional β-hydride transfer, leading to the formation of acetylene (a coke precursor).
Kinetic significance of type I-III transition states
Potential energy diagrams were used to compare the kinetic relevance of reactions cycles discovered for oligomerization, isomerization, cracking, and coking pathways (Fig. 5):
$$2{{{{{{{{\rm{C}}}}}}}}}_{{{{{{{{\rm{2}}}}}}}}}{{{{{{{{\rm{H}}}}}}}}}_{{{{{{{{\rm{4}}}}}}}}}\longrightarrow {{{{{{{{\rm{C}}}}}}}}}_{{{{{{{{\rm{4}}}}}}}}}{{{{{{{{\rm{H}}}}}}}}}_{{{{{{{{\rm{8}}}}}}}}}$$
(1)
$$3{{{{{{{{\rm{C}}}}}}}}}_{{{{{{{{\rm{2}}}}}}}}}{{{{{{{{\rm{H}}}}}}}}}_{{{{{{{{\rm{4}}}}}}}}}\longrightarrow {{{{{{{{\rm{2C}}}}}}}}}_{{{{{{{{\rm{3}}}}}}}}}{{{{{{{{\rm{H}}}}}}}}}_{{{{{{{{\rm{6}}}}}}}}}$$
(2)
$$\frac{n+2}{2}{{{{{{{{\rm{C}}}}}}}}}_{{{{{{{{\rm{2}}}}}}}}}{{{{{{{{\rm{H}}}}}}}}}_{{{{{{{{\rm{4}}}}}}}}}\longrightarrow {{{{{{{{\rm{C}}}}}}}}}_{{{{{{{{\rm{2}}}}}}}}}{{{{{{{{\rm{H}}}}}}}}}_{{{{{{{{\rm{2}}}}}}}}}+{{{{{{{{\rm{C}}}}}}}}}_{{{{{{{{\rm{n}}}}}}}}}{{{{{{{{\rm{H}}}}}}}}}_{{{{{{{{\rm{2n+2}}}}}}}}},\,{{\mbox{where}}}\,{{{{{{{\rm{n}}}}}}}}=1,\,2,\,{{\mbox{and}}}\,3.$$
(3)

a Comparison of the energy landscape for cycle (2) using the cluster model and periodic slab. b Comparison of competing olefin formation pathways (colored) and cracking pathways (gray). c Comparison of competing acetylene formation pathways (colored) and cracking pathways (gray). The species are numbered based on the pathway diagram in Fig. 4, and the energies are reported with respect to the single point energy of Ga-ethyl plus a gaseous ethylene molecule.
The competition between these cycles determines the selectivity of producing gaseous products and coke precursors. Cycle (1) involves ethylene dimerization products (Fig. 4a–c), including 1-butene and associated isomers. One catalytic cycle closes through an ethylene insertion (denoted as type I) and a β-hydride transfer (denoted as type II). Following additional type I-II steps occurring on Ga-ethyl with an adsorbed 1-butene (species ⑤), cis-2-butene and isobutene can also form (Fig. 4b, c). Cycle (2) involves the formation of cracking products, such as propylene, which are not favorable at a relatively low temperature (250 ∘C, 1 atm). One propylene molecule can be obtained through C-C bond breaking of a Ga-2-butyl species (reverse type I). The production of a second propylene molecule occurs via the same Cossee-Arlman oligomerization cycle initiated by the Ga-methyl intermediate (species ⑨, Fig. 4d). In cycle (3), the type III elementary step generates an alkane, which may occur for all Ga-alkyl intermediates, and an alkyne, such as acetylene, is formed that balances the stoichiometry. A relatively facile acetylene formation pathway occurs through a type II step occurring on the Ga-vinyl species from the type III reaction (species , Fig. 4e). Many other relatively low barrier pathways (≤70 kcal/mol) are discovered by YARP, including the formation of various CnH2n species, and CnH2n-2 isomers. As discussed later, although these pathways are not as favorable as the primary Cossee-Arlman cycle, they may be responsible for coking and deactivation pathways and result in a broad diversity of possible products.
To validate the accuracy of the cluster model results, they were benchmarked against conventional periodic DFT with the NEB-Lanczos algorithm for localizing TSs of reaction cycles (1) and (2) (Fig. 5a). For this comparison, the geometries and energies for the cluster model were recalculated at the B3LYP-D3/6-311G(d,p) level of theory to minimize the DFT errors as a confounding factor when comparing the cluster and slab results (see the Supporting Information for additional details on selecting a suitable DFT level). Overall, periodic DFT and the cluster model predict similar binding energies, reaction energies, and reaction barriers, while some systematic deviations can be observed, including modestly higher activation energies predicted by the cluster model. For example, the cluster and periodic models predict activation energies of 1.8 and 1.6 eV, respectively, for the ethylene insertion step (species ② to species ③). The difference may be attributed to long-range order and reconstruction effects in the silica support, which may systematically lower activation energies, but are absent in the cluster model. Another systematic difference is that the binding energies obtained from the periodic model are consistently lower (less negative) than those obtained from the cluster model (Fig. 5a). In the periodic model, the Ga site is surrounded by siloxane frameworks with various ring sizes, which can contribute to a weakened binding due to steric effects, especially for the species with longer carbon chains or with physisorbed olefins. This effect is particularly significant for species ⑤, where 1-butene, the largest physisorbed reactant in our analysis, is involved. The comparison is also affected by the distinct functionals that were used due to their differing availability in the reference molecular and periodic quantum chemistry packages. Nevertheless, the two approaches predict similar relative barriers for all of the TSs under each elementary step type. The mean difference between activation energies for type I versus type II reactions are both 0.8 eV, calculated by the cluster model and the periodic model. Further, all type I transition states are similarly accessible (i.e. barriers within 0.1 eV), and both models predict type II reactions to have consistently lower barriers. The overall agreement between the cluster and periodic models with respect to relative barrier heights validates the usefulness of the cluster models for performing reaction network exploration on silica surfaces.
Figure 5b, c outline the energy landscape comparison between the overall reaction cycles (1)-(3) using the cluster results. In cycle (1), where the carbon chain length doubles and 1-butene is formed (species ① – ⑤), the ethylene insertion involves a higher activation energy (1.76 eV) than the olefin liberation step (1.02 eV). This is consistent with a previous study showing that ethylene insertion is rate-determining in this system37. In the energy landscape of cycle (2), three type I elementary steps have relatively high activation energies: the ethylene insertion shared with cycle (1), the cracking of Ga-2-butyl (species ⑦, 2.40 eV), and the step forming Ga-1-propyl (species ⑩) from Ga-methyl and ethylene (2.08 eV). The cracking of Ga-2-butyl to form Ga-methyl and propylene involves the highest activation energy since it is a reversed type I step. Both periodic and YARP-cluster results predict that type I reactions are exothermic. Therefore, cycle (2) will not dominate the reaction network. Indeed, previous experimental results of Ga single sites show a strong selectivity to olefin oligomerization at 250 ∘C and 1 atm, forming short linear oligomers36. However, the activation free energy of cracking reduces as temperature increases (due to the entropically favored reverse type I step) in comparison to the formation of longer Ga-alkyl carbon chains, thus narrowing the energy difference between Ga-2-butyl (species ⑦) and the cracking TS. Based on the reaction entropies, the selectivity for propylene production over butene should increase with increased temperature or reduced pressure since cycle (2) produces a higher number of gas molecules than cycle (1). Finally, the high barrier of the reverse type I step provides a basis for the competition between type I and III reactions starting from the Ga-2-butyl species. In particular, the formation of 1-butane (species ⑥ – ) can be competitive with cracking reactions (species ⑦ – ⑧). Subsequently, acetylene formation can occur via facile type II reactions (species
–
, 1.68 eV). Therefore, our pathway analysis suggests that type III reactions are kinetically less favorable, but nevertheless represent side-reaction channels that become accessible as they compete with the reverse of type I step. With the formation of alkynes, other side reactions, such as aromatization and coking, may occur as subsequent thermodynamic products, especially at higher temperatures46,47. For example, the selectivities for producing alkynes, aromatics, and coke may increase with the pressure of ethylene since it shifts the reaction free energy of acetylene condensation to benzene. These insights into the reaction network and TSs downstream of Ga-ethyl formation also generalize to other metal single sites with similar electronic properties, such as Zn and Al48,49. In the Supporting information section 2, we have included the analogous energy diagrams for the reactions involving an Al single site. These results show energy barriers and a competition between cycles (1)-(3) that are similar to those of Ga.
Given the generic reaction rules and size constraints that were used to generate the ethylene oligomerization reaction network in this work, there are many opportunities for applying this approach to other heterogeneous systems. Among the salient details of the implementation to consider for future applications are the use of a cluster model as a surrogate for a periodic slab and the major speedup provided by semi-empirical quantum chemistry. Neither detail is intrinsic to applying YARP, and indeed, the cluster assumption was validated here and adopted out of convenience. There are, however, no obstacles to applying YARP using a periodic code, outside of cost. The applicability of this approach to other heterogeneous surfaces is therefore anticipated and is currently under investigation.
These results demonstrate how automatic exploration can be applied to heterogeneous catalytic networks using ethylene oligomerization catalyzed by a silica-supported Ga single site as a benchmark. The method (re)discovered the classic Ga-ethyl-centered Cossee-Arlman oligomerization cycle and several side-product pathways using generic graphical reaction rules and an ultra-low cost TS localization framework. The reaction network elaborated here represents the largest (in terms of both the number of reactive atoms, 20, and the branched alkyl intermediates up to C6) that has been reported for a heterogeneous catalyst using an automated exploration algorithm based on quantum chemically characterized transition states. We do not expect this record to last, since there are relatively few impediments to applying this approach to other systems. Among the foreseeable challenges are that we have not considered adsorption and desorption steps that are often rate-limiting within catalytic cycles. When these steps are non-covalent and molecular in nature, the presented framework can already support them in scenarios where they apply. A more fundamental limitation is that the catalytic surface was treated as static, save for the reactive atom and its nearest neighbors. This assumption will break down for cycles where the surface significantly restructures, such as for nanocatalysts, or where electro-deposition/desorption occurs. This could be accommodated by expanding the space of reactive atoms, but this will increase costs and presents opportunities for further innovation.
As network exploration becomes a predictive tool for catalyst screening, analyzing the reaction data and extracting general insights will become a bottleneck. As one example, characterizing the full reaction network reported here took around one-week of computational time on minimal resources (one node with 128 cores), but the costly manual process of classifying reaction mechanisms and closed cycles belonging to the same family of transformations took the majority of the time in this study. Although knowledge extraction of this kind will always be manual to a degree, there are several opportunities for streamlining this through automated mechanism classification based on orbital analyses and microkinetic modeling, both of which will be leveraged in the future.