Structural requirements for photo-induced RNA-protein cross-linking

Understanding structure-function relationships of RNA-binding proteins requires knowledge of how they bind RNAs in vivo. RNA-protein interactions are studied using light-induced cross-linking at “zerodistance”, yielding nucleotide/amino-acid adducts for mass-spectrometry (MS)-based characterization. However, prerequisites for cross-linking are poorly understood, limiting interpretation of cross-linking data. Here, we report novel insights on cross-linking requirements from studying RBFOX-RRM domain bound to 13C-labeled variants of its heptaribonucleotide binding element as a model. We probed the influence of nucleotide identity, sequence position and amino-acid composition using tandem-MS to assign cross-links at site-specific resolution. We observed cross-linking at three nucleotides, which were stacked onto phenylalanines. Surprisingly, this stacking was required for neighbouring aminoacids to cross-link, and is apparent in published RNA-protein datasets. We hypothesize that πstacking activates cross-linking via electron transfer, whereafter nucleotideand peptide radicals, possibly stabilized by capto-dative effects, recombine. These findings should facilitate interpretation of cross-linking data from structural studies and genome-wide datasets.


Introduction
The human genome encodes more than 1500 RNA binding proteins (RBPs) that regulate key processes, including translation, localisation, stability and splicing 1-3 . In order to understand fully the structure-function relationship of an RBP, it is necessary to identify to which RNAs it binds in vivo, and how non-covalent interactions occur in the binding site. RNA-protein binding occurs at conserved RNA binding domains, such as RNA recognition motifs (RRM), heterogeneous nuclear ribonucleoprotein (hnRNP) K-homology domains and zinc finger (ZnF) domains 4,5 . These domains recognize short, usually single-stranded regions of 3-8 nucleotides (nt) known collectively as consensus RNA binding elements (RBE) 6,7 that often contain degenerate positions. Additional binding affinity and selectivity can be generated via supplementary contacts between the RNA and the protein 5,8 ; for example, the RBP FUS has a bipartite binding mode comprising its ZnF domain and its RRM 9 . RNA-protein binding has also been observed with proteins that lack canonical RNA binding domains (RBDs) 1 . Taken together, these features render difficult the prediction of an RBP's substrates based only on a computational search for its consensus RBE. Indeed, recent studies of the RBFOX protein family showed that only one half of the isolated RNA targets contain the RBFOX consensus binding motif and that other motifs presumably bear responsibility for some of its splicing activities [10][11][12] .
Recently, we introduced cross-linking of segmentally isotope-labelled RNA and tandem mass spectrometry (CLIR-MS), which identifies the sites of amino acid/ribonucleotide cross-links in a single protocol 32 . In CLIR-MS, RNA regions that are suspected to interact with a protein are synthesized in isotopically labelled light-and heavy variants, so that nucleotides involved in cross-linking events appear as peak doublets in the resulting mass spectrum. By focusing on peak doublets during data analysis, cross-linked amino acids and nucleotides are reliably identified after controlled degradation 32 .
The photo-induced formation of covalent bonds between amino acids and ribonucleotides occurs between free radical species at "zero distance" [33][34][35] . Reactions are biased towards uridines and guanosines 19,36,37 , but most amino acids can participate 18,19 . However, cross-links typically only occur at specific positions in the RNA-RBP motif, for which there is currently no mechanistic rationale 38 .
Moreover, it has proven very difficult to gain a deeper understanding of the factors that promote crosslinking, at least partly because the RNA-protein binding site environment, which is critical for crosslinking chemistry, cannot be created in simple solvents. Furthermore, cross-linking reactions usually produce complex product mixtures that are difficult to separate and characterize on a background of possible protein and nucleic acid UV damage 39 .
Here, we have investigated in systematic fashion the structural requirements for the cross-linking of an RNA to its RBP partner. We used the RRM domain of the RBFOX family (FOXRRM) and its RNA consensus binding motif U1G2C3A4U5G6U7 (FOXRBE) as a model system because the complex forms with high affinity, the RRM is a highly abundant RNA binding domain and its structure has been well characterized 40 . We introduced 13 C-labelled ribonucleotides systematically into the FOXRBE heptanucleotide using solid-phase synthesis and used CLIR-MS to identify RNA-protein cross-links with site-specific resolution. High-yielding cross-linking was localised at two clusters of amino acids around two phenylalanines, consistent with previous findings (Götze et al, submitted). However, with few exceptions, it only occured at U1, G2 and G6. We then employed site-specific mutagenesis of both the RNA and the protein to probe the influence of nucleotide species, its position in the RNA sequence and amino acid composition on cross-linking. For each RNA-protein pair, we first confirmed that the same binding mode was maintained as in the wild-type interaction. The analysis revealed that: i) highyielding cross-linking only occurred at three of the seven nucleotide positions and involved guanosine or uridine, and that guanine and uracil were not interchangeable; ii) only nucleobases that stacked with the aromatic amino acid side chains reacted strongly; and, surprisingly, iii) the primary stacking interaction was a requirement for neighbouring amino acids to react. We confirmed the importance of this primary stacking feature in other published studies for which cross-linking and structural data are available. The analysis strongly suggested that a stacking interaction is required to activate a nucleotide for free-radical-based cross-linking reactions in native RNA-protein complexes. We expect that this finding will facilitate the interpretation of RNA-protein cross-linking data. Moreover, it will aid the development of new tools for de novo motif discovery (see refs in 37 ) especially for non-canonical binding motifs, and it will help guide the design of future cross-linking experiments and methods.

Optimization of CLIR-MS to identify RNA-protein cross-links with site-specific resolution
Recently we (RA, AL, FA) introduced the CLIR-MS technique which identifies RNA-protein cross-links on both protein and RNA 32 (Fig. 1a). The original CLIR-MS protocol employs RNAs with contiguous regions of differentially isotope-labelled nucleotides in the cross-linking step. After partial RNA and protein digestion, peptide-oligonucleotide conjugates are identified as matched signal pairs in the precursor ion mass-spectrum. This facilitates localisation of the cross-linked nucleotide to the labelled RNA segment. One drawback of the orignial implementation of CLIR-MS is the inherent requirement for enzymatic 13 C/ 15 N-labelled RNA synthesis (i.e. in vitro transcription). This limits the minimal length and the number and positions of labelled segments that can be included within an RNA of interest. A second limitation is the nuclease digestion step, which typically results in short oligonucleotides (i.e., 1-4 nt). This means that the specific base involved in the cross-link usually cannot be called with confidence, although longer overlapping partial sequences facilitate the localization of the interacting regions on the RNA. In this study, we sought chemical solutions to both of these problems. First, we switched from RNase digestion to alkaline hydrolysis of RNA (Fig. S1a), while exercising care not to degrade the FOXRRM. Hence, the mass analysis of the product mixtures detected a much greater fraction of mono-nucleotide adducts, allowing us to define ambiguously nucleotides that are crosslinked (Fig. S1b). Second, we employed 13 C-labelled phosphoramidites during solid-phase synthesis of RNAs to incorporate 13 C-labelled nucleotides site-specifically 41 (Fig. S2a).

U1, G2 and G6 in FOXRBE cross-link to amino acids centred on phenylalanines in the FOXRRM
We employed a systematic approach in an effort to identify key structural requirements for RNAprotein cross-linking. We first used 13 C-labelled versions of FOXRBE in a CLIR-MS protocol to identify all points of reaction between the RNA and the protein. We then synthesized mutated 13 C-labelled variants of FOXRBE to determine how cross-linking varies with respect to i) the type of the individual nucleobase, ii) their positions in the RBE, and iii) how it is affected by amino acid composition in the FOXRRM binding site. We were mindful of the fact that mutating key sites in the RNA and the protein might alter the mode of (or even abolish) RNA-protein binding, and therefore for each variant we measured the binding affinity to FOXRRM using surface plasmon resonance spectroscopy (SPR).  Table S1). (*N indicates a 13 C-labelled nucleotide).c) CLIR-MS results of b) filtered for mononucleotides showing the site-specific reactivity at U1, G2 and G6 (see Table S1). d) FOXRRM/FOXRBE structure showing two cross-linking clusters centered on F126 and U1 and G2 (upper panel), and the cluster at F160 and G6 (lower panel) based on the structure of Auweter et al. 40 . Structures were visualized with PyMOL (The PyMOL Molecular Graphics System, Version 2.5 Schrödinger, LLC).
We synthesized the seven 13 C-labelled isotopic versions of FOXRBE and confirmed the correct incorporation of the label by liquid-chromatography mass spectrometry (LC-MS) (Fig. S2b-c). We incubated each version of FOXRBE with FOXRRM and performed the CLIR-MS protocol. Mass analysis identified short oligonucleotide fragments cross-linked to peptides in clusters close to F126 and F160 ( Fig. 1b).Each oligonucleotide signal in the spectrum of Figure 1b was detected because it contained a 13 C-labelled ribose. However, other than for mononucleotides, the actual site of cross-linking in the fragment could not be called; for example, the tetranucleotide fragment containing A, C, G and U might have cross-linked at any of the four bases (A, G, C, U). We noted that all (>99%) of the fragments contained at least one uridine or guanosine, consistent with literature reports 16,19,36,37,42 that uracil and guanine mainly participate in cross-linking. Here, the use of alkaline hydrolysis for RNA digestion proved advantageous, since it caused a larger fraction of the RNA to be fully fragmented to mononucleotides (Fig. S1b), thereby facilitating unambiguous assignment of cross-linking sites.
Hence, focusing only on the mononucleotide species in the spectra of Figure 1b, revealed that crosslinking in the FOXRBE involved almost uniquely U1, G2 and G6 (Fig. 1c). The numbers of cross-links were in a similar range for the three nucleotides, although numbers of cross-links cannot be confidently compared between different experiments using the current CLIR-MS protocols. Analysis of cross-link products comprising the oligonucleotides in light of the data of the mononucleotides, revealed that the olgonucleotide species shown in Figure 1b with few exceptions derived from crosslinking with only one of these three nucleotides (to U1, G2 or G6). For example, when A4 is labeled, no mononucleotide are detected but small amounts of the tetranucleotides U1G2C3A4 and A4U5G6U7 are found cross-linked to F126 and F160, respectively (Fig. 1b). This results from U1/G2 and G6 crosslinking to F126 and F160, respectively. The cross-linking of G2 and G6 was consistent with published CLIP data 37,43 (Fig. 1c). Although cross-linking from U1 was detected in the CLIR-MS experiments, cross-linking was hardly observed to U5 (vide infra) or U7, consistent with the hypothesis that strict structural parameters govern the photo-induced reactions between FOXRRM and FOXRBE. Very small numbers of isolated cross-links were also observed in some of the spectra of Figure 1b-c. Although low numbers of cross-links must be considered with caution, their locations suggested that in several cases they were not artifacts. In particular, the cross-links at F160 seen with *UGCAUGU and U*GCAUGU (Fig. 1b,  Sites of cross-linking at the protein appear clearly centred at two phenylalanines (F126 and F160), with a distribution of 1-3 amino acids flanking these sites (Götze et al. submitted). This was confirmed from study of the MS2 spectra in which fragment ions present localise the RNA adducts unambiguously on the peptide backbone (Fig. S3). Exchange of G2 or G6 for A2 or A6 greatly reduced overall cross linking of the mutated sites to the amino-acid clusters 126 and 160, respectively (Fig.   S4). The cross-links of U1, G2 and G6 aligned well with the NMR structure of FOXRRM bound to FOXRBE 40 (Fig. 1d) (PDB ID: 2ERR). The largest number of spectra corresponded to U1 reacting with P125 and F126 and to a lesser extent with I124 and R127 (Fig. 1c). Similarly, G2 cross-linked to P125, I124, F126 and R127. U1 and G2 each stack on one face of F126. Hydrogen-bonds also occur between the bases of U1 and G2, and between R127 and I124, respectively. G6 reacted with F160, to which it also stacks, as well as with neighbouring amino acids at positions 158-164. F158 contacts the ribose of G6. Notably, several close RNA-protein contacts visible in the NMR structure, such as C3 interacting with F126 (but not stacking), G6 stacking with R194 and U5 stacking to H120 40 , did not produce extensive cross-linking.
The current understanding of RNA-protein cross-linking is that close contact between nucleotides and amino acids is the main pre-requisite for a cross-linking event 44,45 . However, only three from the seven nucleotides of FOXRBE engaged in efficent cross-linking, despite close contact between all nucleotides and amino acids in the binding site. In an effort to resolve this apparent contradiction, we investigated two obvious parameters that could influence cross-linking: the chemical reactivities of the nucleotides and the amino acids, as well as their relative positioning of the reactive pair in space. By mutating selected nucleotides and amino acids in the binding pocket, we created a cross-linking structureactivity relationship for the FOXRRM-FOXRBE interaction.

Only uridine cross-links to FOXRRM from position 1 of FOXRBE
We synthesized the three labelled mutants of *NGCAUGU (N=A, G, C; Table S2), as well as the corresponding per-labelled control sequences *N*G*C*A*U*G*U. We first confirmed that the NGCAUGU variants bound to FOXRRM using SPR. In this assay, parent UGCAUGU bound strongly to  (Fig. 2a). This was consistent with the NMR structure showing that the 5'-uridine of FOXRBE contributes to binding by π-stacking to F126 (Fig. S4) 12,40 .
Next, we incubated the RNAs together with FOXRRM and irradiated the complexes at 254 nm at increasing energies. Work-up and analysis by SDS-PAGE for the three NGCAUGU sequence mutants revealed a new slow-migrating band on the gels, similar to that of the wild-type FOXRBE (N=U), consistent with RNA-protein cross-linking (Fig. 2b). The appearance of a band on an SDS-PAGE confirms that cross-linking occurs, but it does not identify the site of cross-linking nor the composition of the product. In order to determine whether any of the mutants cross-linked at the N1-position, we turned to CLIR-MS. CLIR-MS data for per-labelled *N*G*C*A*U*G*U confirmed that the three FOXRBE mutants exhibit the same cross-linking "fingerprint" as wild type FOXRBE, i.e. in the same two amino acid clusters around positions 126 and 160 (Fig. S6). However, in order to differentiate cross-linking of N1 to that from G2 in the 126-cluster, we performed CLIR-MS on the singly labelled sequences (*NGCAUGU). In contrast to U1, cross-linking hardly occured at A1, G1 or C1 (Fig. 2c), consistent with the systematic presence of G2 in the di-or tri-nucleotides found cross-linked to F126 confirming that only G2 in the mutants can cross-link efficiently to the phenylalanine (Fig. S5). This also confirmed the strong bias for uridine photo-reaction 19,36 , though it was surprising that G1 was unreactive, given the reactivity of G2.
In order to determine systematically the propensity for cross-linking at each site in FOXRBE when a more photo-reactive nucleotide (i.e. U or G) is present, we performed CLIR-MS on six additional positional FOXRBE mutants. Thus, we exchanged *U for C3 and A4 in FOXRBE (UG*UAUGU, UGC*UUGU, resp.), and *G for C3, A4, U5 and U7 (UG*GAUGU, UGC*GUGU, UGCA*GGU, UGCAUG*G, resp.). In each case, we confirmed that the mutants bound and cross-linked to FOXRRM using SPR and SDS-PAGE gels (Fig. 2d and Fig. 2e, resp.). Remarkably, in none of these six examples, did the mutated nucleotides cross-link efficiently to the protein (Fig. 2f). The lack of reactivity at U3 (in mutant UG*UAUGU) was particularly surprising give the close proximity of C3 to F126 in the NMR structure.
In summary, while G2 and G6 in wild type FOXRBE cross-linked to FOXRRM, guanosine did not cross-link efficiently at any other of the other five locations in the FOXRBE. Similarly, uridine readily cross-linked to FOXRRM from position N1 -where A, C and G were unreactive -but not from the four other locations in the FOXRBE. Taken together, the data from this controlled model study confirmed that RNA-protein cross-linking events have strict requirements, beyond simply the proximity of a reactive nucleotide and a reactive amino acid. measured in duplicates in a 1:1 dilution series at at least six concentrations starting at 2 µM. The curve for UGCAUGU was fitted using a 1:1 Langmuir binding model including masstransport limitations. The Kd of the remaining sequences were analysed by a steady state affinity fit curve; wild-type FOXRBE shows the highest affinity for FOXRRM. b) All N1-variants undergo cross-linking with FOXRRM with increasing radiation dose. The cross-linking product band is indicated on SDS-PAGE gels by "XL". c) CLIR-MS plots show that cross-linking occurs at U1 of FOXRBE, but not with N1-mutants. The clean-up, enrichment and LC-MS/MS analysis was performed according to the CLIR-MS protocol. The xQuest search was carried out for the masses of mono-to tetranucleotides with a defined mass shift of 5 Da. The data was filtered for mononucleotides (*N indicates a 13 C-labelled nucleotide, the mutated nucleotide is labelled in red) (see Table S1). d) SPR traces show that U or G mutations at N3, N4, N5 and N7 of FOXRBE attenuate but do not abolish, FOXRRM binding to FOXRBE variants. The curve for UGCAUGG was fitted using a 1:1 Langmuir binding model including mass-transport limitations. The Kd of the remaining sequences were analysed by a steady state affinity fit curve. e) SDS-PAGE gels show that FOXRBE variants cross-link to FOXRRM with increasing radiation dose. f) CLIR-MS analysis of singly-labelled FOXRBE variants show that protein-RNA cross-linking does not occur with U or G nucleotides located at N3, N4, N5 and N7 (see Table S1). The clean-up, enrichment and LC-MS/MS analysis was performed according to the CLIR-MS protocol. The xQuest search was carried out for the masses of mono-to tetranucleotides with a defined mass shift of 5 Da. The data was filtered for mononucleotides (*N indicates a 13 C-labelled nucleotide).

Aromatic amino acids play a key role in RNA-protein cross-linking
Analysis of the aforementioned CLIR-MS data (Fig. 1c, Fig. S6) had provided two important insights: i) on the RNA side, strong cross-linking only occurred with nucleotides that were stacked to aromatic amino acids (F126, F160, Fig. 1d); and ii) on the protein side, whereas the cross-links involved F126 and F160, cross-links were also detected at one to three amino-acids up-and downstream of F126 and F160 (positions 124-127, 158-164).
We therefore mutated F126 in FOXRRM to histidine, tyrosine and leucine. We have previously shown using SPR that aromatic amino acids at position 126 are crucial for binding FOXRBE (F126Y: Kd =2.21 nM; F126H: Kd =25.9 nM), although a sterically-fitting aliphatic amino acid such as leucine can partially substitute for the phenylalanine (F126L: Kd =374 nM) 40 . We irradiated these variants in the presence of FOXRBE. All three mutants cross-linked to FOXRBE, as evident from SDS-PAGE (Fig. 3a).
In order to pinpoint the sites of cross-linking, we carried out CLIR-MS experiments with uniformly 13 Clabelled FOXRBE (Fig. 3b). F126Y and F126H mutants cross-linked to the FOXRBE similarly to FOXRRM.
The cross-linking profile was similar for the three complexes at F160, however, cross-linking of the mutants was concentrated on P125 and F126, whereas, when phenylalanine was exchanged for leucine, cross-linking to F126 was abolished. Notably, cross-linking to the neighbouring amino acids 124-127 was also mostly lost (Fig. 3b), confirming the primary role of the aromatic side chain in mediating the cross-linking reactions with flanking amino acids at positions 124, 125 and 127.
Interestingly, in comparison to FOXRRM wild-type, the F126H mutant appears not to cross-link to G2, consistent with the absence of G mononucleotides (brown) or CG dinucleotides (turquoise) in Figure   3b ( Table S1). Although we do not have supporting data, nor know of any precedence in literature, it is plausible that the histidine has a different cross-linking preferences to tyrosine or phenylalanine and/or that stacking to the guanosine is changed in this particular binding site. Unexpectedly, a H120 crosslink occured with the three FOXRRM mutants, which was hardly observed in the wilde type FOXRRM (Fig. 3b, Fig. S6). Analysis of the oligonucleotide cross-links in Figure 3b strongly suggested that the cross-link occurred with U5. In fact, the NMR structure of FOXRRM-FOXRBE shows that U5 adopts a stacking arrangement with H120, and thus might have been expected to cross-link in the wild type FOXRBE-FOXRRM interaction (Fig. S7). Taking these findings together, we propose that π-stacking interactions between aromatic amino acids (e.g. phenylalanine, tyrosine or histidine) and guanosines or uridines are an important pre-requisite for their cross-linking to aromatic side chains, and also to the flanking amino acids.

RNA-protein cross-linking correlates with π-stacking interactions in other complexes
In order to determine whether these findings apply more broadly to RNA-protein cross-linking, we analysed data from four published structures in which the required resolution of nucleotide/amino acid cross-linking was available. For example, Panhale et al. localised a cross-link in hnRNPC around a uridine π-stacked with F19 46,47 (Fig. 3c). In a novel approach, Lelyveld et al. used 18 O-RNA labelling and targeted mass spectrometry to localise the cross-link of U11 in a let-7 microRNA precursor to a πstacked phenylalanine (F55) in the Lin28 cold shock domain (Fig. 3c) [48][49][50] . Kramer et al. used a combination of cross-linking and mass spectrometry to pin-point cross-linking sites on amino acids from yeast ribosomal protein S1 19 . The data was correlated with published crystal structures of the protein, which showed that tryptophan (W117) participates in a π-stacking interaction with uridine U1799 (PDB 4V88) (Fig. 3c) 19,51 . Finally, our own CLIR-MS data from PTBP1 in complex with internal ribosomal entry site (IRES) of encephalomyocarditis virus (EMCV) 32 provided additional evidence for the role of aromatic amino acids in cross-linking reactions. Cross-links occur mainly at four aromatic amino acids (Y127, Y267, H411, H457), and mostly with uridines. Cross-comparison with the NMR solution structure of PTBP1 in complex with short polypyrimidine sequences shows that these amino acids are π-stacked to nucleotides (PDB IDs: 2AD9, 2ADB, 2ADC) (Fig. S8) 52 . In most cases, there is cross-linking not only to the stacked amino acids but also to flanking amino acids, such as Q412.
Interestingly, one of these π-stacked interactions (H457) appear to be with cytosine. Although other groups have described that cytosine participates at a low level in cross-linking 37,42,53 , we did not observe more than minor amounts in mass spectral analyses in this study 18,19,47 .  Table S1) . Binding affinities of FOXRBE are taken from ref 40 . c) Illustration of the localised cross-link with the solution structure i) for hnRNP C binding on AUUUUUC obtained by Cienikova et al. 46,47 (PDB 2MXY). ii) for Lin28A in complex with preEM-let-7f obtained by Nam et al. 49,50 (PDB 3TS0). iii) for 40S ribosomal protein S1 in complex with the 18S rRNA obtained by Ben-Shem et al. 19,51 (PDB 4V88) visualized with PyMOL (The PyMOL Molecular Graphics System, Version 2.5 Schrödinger, LLC).

Photo-induced electron transfer in a π-stacked RNA-protein complex may mediate radical reactions of cross-linking
The site-specific resolution provided by the CLIR-MS technique, and in several published structures, has highlighted the role of π-stacking interactions as a pre-requisite for cross-linking of the FOXRRM-FOXRBE interaction. Thus, U1, G2 and G6 yielded "direct" cross-links with the two phenylalanines with which they stack, and "indirect" cross-links with 1-3 amino acids flanking the phenylalanines. Most notably, nucleobases that do not stack to aromatic side chains, did not partake in cross-linking.
Free radical reactions of nucleic acids and proteins have been well studied in the context of oxidative damage and electron transfer 54,55 , but less thoroughly investigated for RNA-protein interactions 33,44 .
However, a study of the photo-induced intramolecular cyclization of 5-benzyluracil and 6-benzyluracil via benzyl and uracil radical intermediates suggests a plausible model for the cross-linking of U1 with F126 ( Fig. 4a) 39 . Hence, photo-induced electron transfer in the U1-F126 complex generates a shortlived anion/cation radical pair (exciplex) (Fig. 4b; 1 and 2). Subsequent protonation of the uracil radical anion can yield a neutral α-hydroxy radical 55 , whereas ready deprotonation of the F126 radical cation will produce a stabilized benzylic radical. In the absence of oxygen, the major fate of these free radicals is recombination with formation of the direct U1-F126 cross-link ( Fig. 4b; 4). An analgous mechanism has been proposed for the reaction between free uracils/halognated uracils and tyrosine derivatives 35,56 . Possible mechanism of UV cross-linking between the stacked F126 und U1 of the FOXRRM (1). Photo-induced electron transfer leads to a radical ion pair (2). After protonation/deprotonation steps, the radicals on the benzylic position of F126 and C4 of uridine (3) recombine to yield direct cross-links (4). Indirect cross-links between U1 and R127 (5), P125 (6) or I124 (7) may form when the radical cation of F126 oxidizes amide carbonyls from flanking amino acids, which rearrange to radicals stabilized by capto-dative effects at the α-carbons (*).
Alternatively, the F126 radical, or radical cation, may rearrange to neighbouring amino acids in processes mediated by hydrogen atom abstractions 55 , or via oxidation of amide carbonyls (by the F126 radical cation) 57 , yielding free radicals at peptide α-carbon sites on the protein backbone. Viehe et al have proposed that α-carbon radicals are especially stabilized thermodynamically by capto-dative effects, i.e. simultaneously by electron-withdrawing (-C=O) and electron-donating (-NR2) groups 58 and, furthermore, that they readily combine with other radicals. Hence, depending on the lifetimes and the locations of these radicals on the protein backbone, "indirect" cross-links to U1 may form, yielding products that are identified by mass spectral analysis after controlled RNA-and protein-degradation (e. g. 5-7; Fig. 4b). These steps are consistent with the outcome of cross-linking reactions of F126 mutants, i.e. the exchange of phenylalanine for histidine and tyrosine produced similar direct and indirect cross-links, whereas leucine was mostly inactive since its aliphatic side chain cannot partake in the initial electron transfer.
Based on the similarity of the cross-linking profiles from U1 and G2 (Fig. 1c), it seems intuitively likely that guanosines G2 and G6 may follow a similar mechanistic reaction path as U1. Thus, photoexcitation of the stacked guanine-phenyl ring systems produces free radicals at G2 and G6, as well as on the peptide backbone around F126 and F160. Recombination yields direct and indirect cross-links, which in the case of G2 are to the same α-carbon radicals that couple with U1. The nature of the initial exciplex formed from electron-transfer in a stacked guanosine-phenylalanine is unclear, and we were unable to identify a literature precedent for such a mechanism. However, well-cited studies have shown photo-induced electron transfer between π-stacked pyrimidine and purine nucleobases that produce long-lived exciplexes 59,60 . Electron transfer between an amino acid and a nucleotide might be expected to occur in the direction that yields the lowest-energy exciplex. However, due to the special environment of an RNA-protein binding site (see discussions in refs 57,59 ), this may not necessarily correlate with the redox potentials of isolated nucleotides or aromatic amino acid side chains.
Together, our observations demonstrate the importance of local environment to cross-linking in the RNA-protein binding site, and at least partly explain why cross-links occur only at specific positions in an RNA-RBP motif.

Conclusion
For a complete understanding of the roles that RBPs play in cellular processes, it is necessary to understand at the atomic level how RNA binding domains in proteins engage with RNAs. RNA-protein interactions are generally characterized in two main ways in vivo: isolating proteins and sequencing the bound RNAs (CLIP methods), and identifying proteins bound to RNAs, for example, by mass spectrometry. Most of these approaches rely upon photo-induced cross-linking, which provides direct evidence of binding under native conditions. However, presently, native cross-linking-based methods suffer from two drawbacks: i) it is challenging to identify simultaneously sites of cross-linking on the RNA and protein, ii) cross-linking in an RNA-RBP motif typically proceeds inefficiently and in an unpredictable fashion.
Recently, we introduced the CLIR-MS method 32 . This technique employs isotope-labelled RNAs to resolve amino acid/ribonucleotide cross-links in a single protocol, whereby segments of labelled RNA are produced by in vitro transcription prior to ligation-assembly into a full length RNA. In this new study, we have broadened the application of CLIR-MS through the use of chemically synthesized 13  Photo-irradiation of the FOXRRM-FOXRBE complex led to key observations with potentially wide-ranging consequences: 1) strong cross-linking occurred between U1, G2 and G6 with clusters of amino acids centered around the phenylalanines F126 and F160; 2) very little cross-linking was observed at other uridines in the parent or a mutated FOXRBE (C3/U3, U5 and U7); and 3) amino acids that flank F126 and F160 also cross-linked efficently to U1, G2 and G6, but not to other nucleotides of FOXRBE. Since the NMR structure of FOXRRM-FOXRBE 40 shows that U1 and G2 stack to F126, and that G6 stacks with F160, the data suggested that a stacking interaction is a requirement for cross-linking events in an RNA-protein interaction, at least for this RRM domain. Indeed, other aromatic side chains could substitute for F126 in cross-linking, but incorporation of leucine abolished direct and almost all indirect cross-linking to U1/G2. Other researchers have noted in passing the increased presence of aromatic amino acid side chains in UV cross-linking data sets (see refs 18,19,33,48,61 ), but have not to our knowledge recognized its role as a trigger for cross-linking, nor distinguished between direct and indirect cross-link events. It is clear that additional factors may also contribute to cross-linking events in RNA-protein sites, including: efficiency of the photo-induced electron transfer, the ability to stabilize free radicals, the flexibility of the system to adopt to the configurations that are required for the radical reactions 62 and the proximity of reactive pairs 63 (Sarnowski et al; in preparation). Furthermore, our findings cannot explain all RNA-protein cross-linking reactions, at least those involving sulphurcontaining amino acids, such as cysteine, which are highly photoreactive and prone to cross-link probably due to the high reactivity of the thiyl radical 19,34,55 . However, our results may partly explain why double-stranded RNA cross-links so poorly to proteins 64 , since in a double-stranded RNA πstacking between aromatic amino acids and nucleobases is hindered. Indeed, double-stranded RNAbinding motifs (dsRBM) recognize their substrates mainly by shape and contacts with the sugarphosphate backbone 7,65 .
The findings in this work were enabled by the combination of site-specific labeling with the CLIR-MS protocol, which provides enhanced knowledge of cross-linking sites at single nucleotide and amino acid resolution. This technology is inherently flexible, and we are exploring further improvements to the the method (Götze et al, submitted). Our findings here demonstrate the importance of local environment to cross-linking in the binding site, i.e. beyond the simple proximity of photo-reactive nucleotides and amino acids, thereby helping to explain at least in part why cross-links occur only at specific sites in an RNA-RBP motif. Because of the inherent variations in the ways that RBPs recognize their RNA targets, predictive modeling of RBP selectivity is extremely challenging. Our findings can be implemented into the development of new tools 37,66 for de novo motif discovery. In a broader sense, we expect our findings also to be of value in understanding RNA-protein interactions from the analysis of CLIP data sets, which is currently an area of intense research 16 .