Radiation Test Results of the Virtex FPGA and ZBT SRAM for Space Based Reconfigurable Computing

Earl Fuller, Phil Blain, Michael Caffrey, Carl Carmichael,
Noor Khalsa, Anthony Salazar

1 Los Alamos National Laboratory
2 Novus Technologies, Inc.
3 Xilinx, Inc.

Abstract

A comprehensive Single Event Effects (SEE) characterization of advanced commercial technologies was conducted using the heavy-ion test facility at Texas A&M. The components evaluated included a 322,000 gate Virtex reprogrammable FPGA (XQVR300) from Xilinx, and several manufacturers versions of 4Meg Zero Burst Turnaround (ZBT) SRAMs. The SRAMs all unfortunately latched-up at or below an LET of 60 MeV-cm$^2$/mg and no further testing was done. However, the Virtex FPGA was immune to single event latch-up up to an LET of 125 MeV-cm$^2$/mg. Detailed single event upset testing was then done in both static as well as dynamic operating conditions to be able to understand the upset modes and develop mitigation strategies for a space based reconfigurable computing application. The upset sensitivity and detection and mitigation techniques are discussed and the results indicate that the Virtex FPGA is a good candidate for satellite applications.

I. INTRODUCTION

We at Los Alamos National Laboratory† are designing a high performance Reconfigurable Computing (RCC) space module for high-speed digital signal processing and on-orbit signal analysis. RCC computing, utilizing field programmable gate arrays (FPGAs), achieves in excess of 100 Mega samples/sec processing performance and the capability for on-orbit reconfiguration. Reconfiguration offers an evolvable hardware that accommodates multiple missions, targets, or techniques all with the same space and power resource. This flexibility helps combat the problem of premature obsolescence. Adapting COTS technology to the hostile space environment is necessary to meet our performance goals. There are several sources that discuss reconfigurable computing performance advantages[1,2,3] and architectures[4,5].

The availability of the Virtex SRAM-based FPGA by Xilinx, Inc., which provides up to one million configurable gates on a single chip, provides a unique opportunity to develop this system. The Single Event Effects (SEE) test results and methodology for the Virtex products will be discussed. We also require high speed SRAM, so the ZBT SRAMs available from Micron, IDT, and Motorola have been tested. There are three broad challenges for this design. The first is finding the components that will survive the radiation environment. The second is how to design for the high single event upset rate. The last concerns package and printed circuit board reliability in a thermal cycling environment.

II. CANDIDATE TECHNOLOGIES

A. Virtex FPGA

The Virtex FPGA is a relatively new product from Xilinx Inc. It is an SRAM based FPGA that supports a range of 50K to 1M configurable gates and is fabricated on .22 µ CMOS with 5 metal layers. Besides a significant increase in density, the Virtex also offers several architectural and process advantages. The smaller feature size offers substantial gains in power and speed. Improvement in the IO blocks in conjunction with on-chip delay locked loops (DLL) provides system level performance in excess of 150 MHz. The DLL offers skew adjustment and clock doubling capability while additional registers in the IO block reduces setup and hold times. The IO blocks also support 16 different interface standards. The enhanced CLB (Configurable Logic Block) architecture now supports synchronous reset which reduces logic requirements for synchronous design, eliminates the potential race conditions with asynchronous resets, and enhances the information supplied with static timing analysis. On chip block RAM is also available, up to 128K bits, that support true dual port synchronous operation. The part operates at a core supply of 2.5V with many different IO voltages supported. It is offered in new high-density packages including a 560-pin plastic BGA and a 680 pin fine pitch BGA that supports up to 512 user IO.

The candidate radiation tolerant part is a Virtex fabricated on epitaxial silicon wafer with the commercial mask set. Xilinx test data indicates that this technology is total ionizing dose...
(TID) tolerant to greater than 50k rads(Si). Utilizing an epitaxial silicon wafer fabrication process was expected to provide immunity to single event latch-up making it suitable for many space applications. For space applications, where TCE mismatch and assembly reliability is a concern, the Virtex will be offered in a hermetic Column Grid Array (CGA) using the same footprint as the commercial BGA. This package should offer increased reliability because the columns under stress can deform, avoiding fracture.

This FPGA offers the speed, density, IO, and architecture required for reconfigurable computing, and is an excellent choice for commercial applications. The possibility of using it in space motivated the SEE testing. Two features of the architecture will also help overcome upset problems. The first is that the configuration bitstream can be read back from the part while in operation, allowing continuous monitoring for an upset in the configuration. Second, the part supports partial reconfiguration, though at a rather coarse granularity. Partial reconfiguration can speed upset recovery time, but there are many design challenges to be overcome. The device supports partial reconfiguration, the Xilinx is developing software tools to more effectively support this feature.

B. Additional memory

Effective reconfigurable computing requires many independent banks of very fast SRAM. Bandwidth is critical for high speed DSP. Off chip RAM can function as delay buffers, filter tap storage and other typical DSP memory needs. The specific need is for very fast access at the rate of the mathematical calculation, preferably read/write operations at twice the sample rate of the application. One architecture stands apart for this application, the Zero Bus Turnaround synchronous SRAM offered by Motorola, Micron and IDT. These parts perform back-to-back read/write operations in 12 ns, making them highly desirable for space based reconfigurable computing. The postulated trend of smaller feature sizes offering greater total ionizing dose tolerance motivated us to test these parts for a radiation environment. Data indicated that these technologies would provide TID immunity in the range of 35k to 100k rads(Si). The presence of epitaxial silicon in some of the samples (Micron and Motorola) also offered promise of tolerance to the radiation environment in space. The specific samples tested were the Micron 55L64L36F, IDT 71V547, and Motorola MCM63Z737. Each of these devices includes a 36-bit wide word size and various synchronous operating modes.

III. SINGLE EVENT EFFECTS TESTING

A. Test Strategy

A comprehensive characterization of a complex device can be challenging and particularly so when the function of the device is programmable as with an FPGA. The difficulty is devising a test that can be related to the eventual application so that any upset rate measured in the laboratory can be related to an expected upset rate in an orbital scenario. Accordingly, this test was divided into several orbital scenarios. 

Single Event Latch-up (SEL) testing was done first as a fundamental requirement. If latch-up immunity could not be demonstrated, characterization of soft errors could be irrelevant. SEL was conducted with the device in a known static state, monitoring power supply current with over-current protection to prevent damage if latch-up did occur. If an increase in current would be observed under radiation, reconfiguration would be done to determine if the current increase was due to a latch condition or simply internal contention due to SEU. A sample of each of the device types tested was cross-sectioned to verify the thickness of the top layers that the ion beam needed to penetrate to reach the sensitive silicon. The most difficult part to test is the FPGA, which has 5 metal layers for interconnection above the silicon. The total thickness of conductor and insulator that ions need to penetrate exceeds 13 microns for Virtex. The LET necessary to demonstrate SEL immunity is 125 MeV-cm²/mg since there is essentially no particle in the galactic cosmic ray spectrum above this number. In order to meet this requirement it was necessary to use a high energy cyclotron such as the one at Texas A&M which produces 2,068 MeV Au ions capable of penetrating this over-layer and having sufficient residual energy to provide the required LET.

Static SEU testing was performed to measure the upset characteristic of each of the storage latches present in the part. The SRAMs are tested easily this way. The Virtex FPGA is similarly easy to do this way since a serial scan capability exists for each configuration routing bit and for the RAM, CLB, and other functional blocks of the part. In particular, the XQVR300 tested included the following static bits that were accessible:

<table>
<thead>
<tr>
<th>Latch Type</th>
<th>Function</th>
<th>No. Bits</th>
</tr>
</thead>
<tbody>
<tr>
<td>CLB</td>
<td>Configuration Logic Blocks</td>
<td>6,144</td>
</tr>
<tr>
<td>IOB</td>
<td>Programmable IO Blocks</td>
<td>948</td>
</tr>
<tr>
<td>LUT</td>
<td>Look Up Tables</td>
<td>98,304</td>
</tr>
<tr>
<td>BRAM</td>
<td>Block RAM</td>
<td>65,536</td>
</tr>
<tr>
<td></td>
<td>Routing &amp; Other Bits</td>
<td>1,579,860</td>
</tr>
</tbody>
</table>

Dynamic SEU testing is needed to test what static testing misses. Even though we are able to interrogate more than 1.7M bits on the XQVR300, there is much more circuitry we are not testing which we generally refer to as combinatorial logic; that is, the circuitry that connects the latches together. Moreover, in dynamic operation, transient signal propagation can be upset if an ion strike occurs along such a path, and the sensitivity can vary with operating frequency. These additional sensitivities can add to the total cross-section of the device. In order to measure these effects, three different circuit
designs were developed to highlight different sections of the FPGA with the capability to vary operating frequency from 5 MHz to 80 MHz. We measure particle fluence to the first upset to determine the cross-section of the part in this mode. Finally, the capability to read back the bit configuration allows a measure of the total number of static bit upsets that occurred for each dynamic upset. It is likely that not every static bit upset has a consequence in a dynamic circuit and we can get a measure of this factor by doing this kind of dynamic testing.

Finally, we look for Single Event Functional Interrupts (SEFI) which are upset modes of special functions unique to each part type. One can argue that any upset is an upset, however it is useful in designing a recovery and mitigation approach if we can measure the possibility and estimate the probability of catastrophic upset modes occurring. So we look especially for unexpected resets, JTAG TAP controller upsets, configuration control logic upsets, and the like, which have a unique upset signature not like single bits.

B. Texas A&M Cyclotron

The testing was done at the Texas A&M University Cyclotron Institute. This facility was chosen because it was one of the few that provides particle beams of the energy required to penetrate the silicon over-layers of the Virtex technology. This is not much concern for the lower LET range but is particularly important in verifying SEL immunity to an LET above 100 MeV-cm\(^2\)/mg. Using Au ions at an energy of 2.068 MeV provides for an initial LET of 86.6 MeV-cm\(^2\)/mg with a penetration range of over 100 microns in the over-layers and the silicon. Variations in the incident angle provided an effective LET of 125 that met our objective of exceeding the maximum LET typically seen in the GCR spectrum. In addition, we could expose the devices to a fluence of up to 10\(^{8}\) ions/cm\(^2\) maximizing our probability of detecting a latch-up situation.

C. FPGA Testing

Xilinx provided a very flexible silicon verification system called AFX (Advanced FPGA Development System). A photo of hardware is shown in Figure 1.

![AFX motherboard](image)

Figure 1: AFX motherboard used for SEE testing of the Virtex FPGA. Support circuitry interfaces to a PC and allows for configuration of the test device to a specific pattern or functional design.

The system consists of the AFX motherboard and adapter cards for various package configurations, power supplies, a Firewire adapter card for a PC interface, and interface software. Test software was developed to read and write to all of the storage locations via the serial link to the device. Latch-up testing could be done with precise control of the power supply voltage and its current. Static SEU testing was easily done given the ability to configure each bit as desired. Because an arbitrary pattern could result in contention or violation of illegal conditions in the FPGA, an “all-off” pattern was selected for the static test. An alternate pattern was also available to test for any pattern sensitivity. Table 1 above indicates the static bits that were tested. The test algorithm was implemented as follows:

1. Write configuration bit stream with “all-off” data pattern
2. Verify correct configuration with readback
3. Note quiescent current consumption.
4. Pause while ion beam is applied to a given fluence
5. Note current consumption.
6. Verify post radiation configuration with readback
7. Compare data before and after radiation.
8. Record bit upsets for all logic blocks
9. Download & readback, verify current returns to quiescent and configuration readback function as expected.
10. Repeat at various LET and fluence values and plot SEU characteristic.

The static pattern was also used for as part of the SEL test. The above algorithm was followed with the exception that steps 6, 7, 8, and 10 where omitted. The important point is that SEL testing used the “all-off” pattern as a starting point and reconfigured to that pattern after radiation to verify that any current increase observed was not due to a latch-up condition.

The motivation for testing the FPGA in a dynamic environment is to estimate its performance in an operational scenario. The dynamic testing takes into account the in possible increased sensitive cross-section of the part due to transient induced upsets. The goal was to develop a configuration for the device under test that utilized a large proportion of the resources in order to give the results statistical significance. The principal resources being considered are the CLB flip flops, BRAM, LUTs and DLLs. The three different designs developed include one exclusively for LUTs and flip-flops, one focusing on the BRAM, and one combination of the two.

The test fixture supplied by Xilinx allows for only a single device under test at a time. Rather than comparing the output of the device under test to a “golden” part, we tested two identical circuit modules in the same part and compared the two outputs to detect an upset. The software to determine the presence of upset and alert the operator monitored redundant
outputs from the comparison circuit. Once the operator was alerted the beam was stopped and dose measurements made. In addition the configuration bitstream was read back from the part to determine the number of static configuration upsets that occur before the dynamic upset was detected at the circuit output.

The BRAM test circuit (Figure 2) treated each 4 Kbit block as a 511x8 FIFO that is filled with a random number generator (LFSR). Once full the output of the FIFO is continuously compared to an identical random number generator, the comparison providing an indication of upset. No differentiation is made between an upset that occurs in the random number generator, the FIFO, or the comparison circuit. The outputs of 16 test FIFOs were logically OR’ed together and monitored by software. For each FIFO, the input generator operated from a different DLL than the generator that sourced the comparison to determine any increase sensitivity from the clock management circuit. The BRAM test configuration utilized 100 percent of the available BRAM and 24 percent of the available logic slices.

![Figure 2: Block RAM dynamic test circuit.](image)

The CLB test circuit (Figure 3) partitions the available CLBs into two large shift registers where the shift register used both LUTs and CLB flip-flops. Each shift register was clocked with the clock from a separate DLL. Each shift register was fed by the same oscillating flip-flop. Output from the two shift registers was compared to detect upset, with an output monitored by software. This design utilized 95 percent of the available slices.

![Figure 3: Configuration Logic Block dynamic test circuit](image)

C. SRAM Testing

Testing synchronous SRAMs requires address, data, and select pins to be synchronized with an external clock. A PC based data acquisition system was developed to read and write a variety of patterns to test for all 0s, all 1s, and checkerboard data SEU sensitivity. SEL was performed as with the FPGA by controlling a power supply and monitoring current, while preventing damaging over-current. The test was intended to interrogate the SEU sensitivity but since the devices failed SEL no detailed SEU characterization was done.

IV. TEST RESULTS

A. FPGA

SEL testing was conducted first to validate the space worthiness of the parts. Two samples were exposed to Au ions to achieve an effective LET of 125 MeV-cm²/mg. This was achieved with Au ions at 2,068 MeV, an incident angle of 30°, and allowance for energy attenuation in the overlayers of the Virtex parts. The devices would be initialized with the ‘all-off” pattern so that a known configuration state would be determined. The power supply was set to 2.5 volts, was allowed to accumulate to $10^7$ ions/cm² for most runs and in one case $10^8$ ions/cm² and the power supply current was monitored. In each test, current increased during the particle exposure from starting values of 10 to 20 mA increasing to 300 to 500 mA at the end of the radiation exposure. Without cycling power, the part would be reconfigured and the current would return to its pre-test level. The conclusion from this series of tests is that the part does not latch-up to an LET of 125 MeV-cm²/mg. The current increase is due to internal contention created by logic upsets that are accumulating throughout each run. This phenomenon was observed also during lower LET SEU testing. Finally, it should be noted that the exposure to $10^8$ ions/cm² produced an increase in current that remained after reconfiguration and after power cycling. This was attributed to the equivalent ionizing dose (>100k rad(Si)) accumulated by such a large fluence and it gradually...
annealed over a few hours. Despite the dose, the parametric degradation the device remained functionally good.

Static SEU testing was then done to measure the upset sensitivity of all of the latch types identified in Table 1. Testing started at an LET of 125 MeV-cm²/mg and was gradually reduced to observe the threshold for upset for the storage latches. The lowest LET tested was 1.2 MeV-cm²/mg by which time the average bit cross-section was down more than 4 orders of magnitude from its saturation value. Figure 4 shows the bit cross-section measurements and the Weibull curve that best approximates the data.

Figure 4: The bit upset cross-section vs. LET for the Xilinx Virtex XQVR300 for static operation.

An effort was made to be sure that fluence accumulated long enough to achieve more that 1,000 total upsets in the device so as to provide for statistical validity in the result. As noted above, an accumulation of this many upsets often results in internal nodes being placed in contention that results in increased current. The observed increase in current was very negligible for just a few bits but could increase to 500 mA for several thousand upsets. The current increase would tend to reach a peak and even decline as upsets accumulated, indicating contention conditions would vary as bit upsets accumulated.

After each run, the status of each bit was recorded so that it could be determined what latch types had upset and we could look for variations in the upset sensitivity of different storage elements. Table 2 shows the data that was observed.

### Table 2: SEU Characteristics of latch types

<table>
<thead>
<tr>
<th>Latch Type</th>
<th>Threshold LET (MeV-cm²/mg)</th>
<th>Saturation Cross-section (cm²)</th>
</tr>
</thead>
<tbody>
<tr>
<td>CLB</td>
<td>5.0</td>
<td>6.5 E-8</td>
</tr>
<tr>
<td>LUT</td>
<td>1.8</td>
<td>21.0 E-8</td>
</tr>
<tr>
<td>BRAM</td>
<td>1.2</td>
<td>16.0 E-8</td>
</tr>
</tbody>
</table>

Because the number of routing bits dominates the total bits in the device the weighted average bit cross-section matches the routing bits. Clearly, the CLB, LUT, and BRAM bits are less likely to upset due to either their larger threshold or larger cross-section, or both.

In addition to these bit upsets, one unusual upset signature was recorded which represents an upset in the configuration control logic register. In this situation the number of bit upsets observed exceeded the total number of particles radiated on the die by as much as 10 times. This was not a multiple bit upset mode but rather an upset in the configuration control that results in more than a million bits being misread. This clearly is a SEFI type of upset and apparently represents a complete loss of configuration when it occurs. The observed LET threshold was between 8 an 16 MeV-cm²/mg and only occurred if the fluence exceeded 10⁵ ions/cm². Therefore the device cross-section for this upset mode is very low (<1 E-5 cm²) relative to the total cross-section for the part and there is a very small probability of occurrence on-orbit.

Dynamic testing produced additional upsets as expected in transient signals or combinatorial logic. The cross-section for the device was measured by repeatedly measuring fluence to the first dynamic upset. Figure 5 shows a plot of this data.

![Dynamic SEU Cross Section for the Xilinx Virtex XQVR300](image-url)

Figure 5: The device upset cross-section vs. LET for the Xilinx Virtex XQVR300 operating in a dynamic mode. The Weibull curve does not fit the data points well for LET values above 25. This is likely due to the method of measuring fluence to the first failure. The ion beam operator was required to manually respond to an error indicator and terminate the beam. With total beam exposure times of 2 to 3 seconds, the response time added significant error to the fluence applied.

We observed between two and eight configuration bitstream upsets each time a dynamic upset was detected (the
mean was 4.1 and the standard deviation was 2.4). Certain bits of the configuration bitstream are masked out at this point because they are toggling as part of the design, such as the LUTs, CLB flip-flops, and BRAM bits. The dynamic upsets observed may have occurred due to an upset of a CLB flip-flop, the LUT bits, BRAM bits, a transient, or by errors induced in the circuit by the static configuration bit upsets. On one test, we reset the circuit after detecting a dynamic upset and found that the circuit still functioned correctly after measuring eight static configuration bitstream upsets. This indicates that the static upsets observed did not contribute to the dynamic upset that was detected, suggesting that not all static configuration bitstream upsets contribute to a failure. The statistics are not large but the data suggests that perhaps 1 in 4 static bit upsets will result in an upset in the function of the device. This would subtract from the sensitive cross-section. For the XQVR300, there are 1.75M total bits (see Table 1). The saturation cross-section from Figure 4 is 8E-8 cm²/bit and therefore the total device cross-section is 1.4E-1 cm²/device. This calculated value is roughly 4 times the measured device cross-section shown in Figure 5 of 3E-2 cm², which tends to corroborate the indication that only 1 in 4 static bit upsets results in a device upset. The flexibility of the Virtex architecture results in many unused routing bits for any given design which helps explain why some bit upsets will not have a device upset consequence.

Finally, if the detected dynamic upset was caused by a transient, then this would add to the sensitive cross-section. Unfortunately there is nothing in the data that would indicate whether this did or did not occur. In particular, we saw no significant difference in dynamic upset rate over the range of frequencies tested (5MHz to 80MHz). More work needs to be done in this area to provide better statistical data.

B. Memory

Since all samples latch-up at both LET values tested (125 and then 58 MeV·cm²/mg) these COTS technologies were judged unsuitable and no detailed upset characterization was done.

C. On-orbit Upset Rate Estimates for the Virtex FPGA

The low LET threshold measured for this technology indicates that it will be sensitive to upset in many different orbits. Moreover, the upset rate will increase during periods of solar flares since the threshold is low enough for solar protons to cause upset. It is useful to get a sense of the upset sensitivity of this part by looking at several orbital scenarios. Table 3 shows sample orbits for which upset rate estimates are made below.

<table>
<thead>
<tr>
<th>Orbit</th>
<th>Altitude (km)</th>
<th>Inclination Angle (degrees)</th>
</tr>
</thead>
<tbody>
<tr>
<td>LEO</td>
<td>780</td>
<td>86</td>
</tr>
<tr>
<td>MEO</td>
<td>1,400</td>
<td>85</td>
</tr>
<tr>
<td>GPS</td>
<td>22,600</td>
<td>55</td>
</tr>
<tr>
<td>GEO</td>
<td>35,790</td>
<td>0</td>
</tr>
</tbody>
</table>

It is difficult to determine the bit upset rate for which there is a consequence in terms of a device upset. From the characterization of the Virtex FPGA, it is clear that not every bit upset will result in an upset of the device. The dynamic testing result indicates that only one in 4 bit upset will result in an upset of consequence to the function configured in the FPGA. Admittedly, the statistics are small for this conclusion and it may be prudent to use a smaller ratio than 4 to 1. The only unusual upset signature observed was the configuration control logic register upset that adds a small cross-section to the total bit cross-section as mentioned earlier. Finally, the upset contribution due to combinatorial logic and transient signal propagation could add cross-section to the total but no data from this work indicates that this should be done. Device upset rates on orbit for the XQVR300 are shown in Table 4 below. These projections are based on the static data and count all bits in the in the calculation of cross-section. If the 1 in 4 ratio of static bit upset to device upset data is applied the rate estimates would be lower by up to 75%. The CHIME model was used and assumed the galactic and solar spectrum over a 5 year mission from 2001 to 2005, and the solar flare model used the JPL1991 spectrum.

<table>
<thead>
<tr>
<th>Orbit</th>
<th>Upsets per device day CGR with no flare enhancement</th>
<th>Upsets per device day CGR with JPL1991 flare enhancement</th>
</tr>
</thead>
<tbody>
<tr>
<td>LEO</td>
<td>2.05</td>
<td>20.9</td>
</tr>
<tr>
<td>MEO</td>
<td>2.35</td>
<td>23.7</td>
</tr>
<tr>
<td>GPS</td>
<td>5.77</td>
<td>72.2</td>
</tr>
<tr>
<td>GEO</td>
<td>5.90</td>
<td>81.5</td>
</tr>
</tbody>
</table>

V. ON-ORBIT SEU DETECTION, MITIGATION, AND RECOVERY

Digitally processing remotely sensed data presents substantial performance challenges that cannot be met with traditional radiation hardened computing. Adapting commercial FPGAs to the space environment helps meet the performance requirements, but presents its own challenges. In particular, the Virtex’s high SEU rate must be overcome by system level design including mitigation, detection, and recovery. Mitigation refers to alleviation of the consequences of an upset by using SEU tolerant design techniques such as redundancy, detection to the ability to observe an upset, and
recovery to the action taken when an upset is observed. Accepting occasional loss of data reduces the cost of meeting these challenges. In many applications this is obviously unacceptable; however, remote sensing systems already discard vast quantities of samples in search of the “needle in a haystack” information, or because of an inability to process the data available because of performance constraints or downlink limits. In this environment, achieving 99% duty cycle in a cost effective manner may be considered success.

The upsets experienced by the Virtex fall into three categories based on observability and severity: a static configuration bitstream upset, a dynamic upset (either transient or upset of a user memory cell), and functional upset (e.g. configuration circuit or JTAG tap controller). Static configuration upsets are detectable via the readback feature of the Virtex without effecting the operation of the device. Such upsets can be corrected with either total or partial reconfiguration[6]. Without redundancy there will be an interruption of service. With redundant logic built into one device, partial reconfiguration may be able to repair the problem without interruption. Redundancy in logic will also mitigate transient upsets. Single chip redundancy will not improve reliability against failure due to functional interrupts such as configuration control upset in which the entire device configuration is cleared, nor will it help against a JTAG tap controller upset. It is important to consider that the sensitive cross-section of these failure modes is extremely small compared to other possible upsets.

Configuration control upset and corresponding clearing of the device is detectable via readback, and recoverable via reconfiguration. For uninterrupted service, multiple device redundancy is required. The observability of upset in the JTAG TAP controller is uncertain at this point. Recovery is achieved within 5 clock cycles by placing a pull-up resistor on TMS and clocking TCLK. R. Katz provides a good description of the problem and recovery technique[7]. Once again, for complete uninterrupted service, multiple device redundancy is required. JTAG failure may result in contention of the device IO resulting in increased current consumption. Damage could result if the device experiences sinking/sourcing currents in excess maximum specifications. This depends, of course, on other components in the system design that should be selected carefully to prevent the possibility of damage.

It is clear that an operational mode that tolerates occasional loss of data due to an interruption of service will be cost effective. Remote sensing appears to be one of the applications well suited to this operational concept. Alternatively, reliability can be achieved with multiple device redundancy. Considering the device performance, even with increases in space and power due to redundancy, they may be suitable for many applications and can be more cost effective given the high development cost of a dedicated ASIC.

VI. THERMAL RELIABILITY CONSIDERATIONS

With a complex FPGA like the Virtex device, the total power dissipation per part will vary widely depending on the device function programmed and the operating frequency. The largest part in the family with a million equivalent system gates can generate 7 watts. The impact of this issue for SEE is minimal since the data latches are 6-transistor devices with active loads. The absence of polysilicon resistor structures, whose values can vary significantly with temperature, results in the Virtex SEE performance being relatively constant over temperature. It should be noted that the device temperature was monitored in-situ during the SEE testing and typically the case temperature was 30° to 45° C.

The major reliability consideration for satellite applications is thermal management in the vacuum of space. To take advantage of the large IO capability, Ball Grid Array (BGA) and Ceramic Column Grid Array (CGA) packages are utilized. A heat sink is available on the top of the package and is very useful in ground based environments to aid in heat dissipation via thermal radiation from the backside. However, in a satellite system the major heat conduction path is through the leads (or solder columns) to the printed circuit board. The CGA package is preferred since it will have somewhat greater thermal conduction than the BGA. Nevertheless there remains a reliability risk is care is not taken to assure that the PC board and the package have a similar thermal expansion coefficient (TCE). A mismatch can result in long term wearout in the form of fractured lead connections if thermal cycles are frequent enough and of great enough range. Once again the CGA package is superior to the BGA in that the taller solder columns of the CGA will allow for some flexing to occur thereby avoiding fracture as compared to the more rigid solder balls of the BGA. The Los Alamos RCC program will place a high priority on TCE matching in printed circuit board design for these devices.

VII. CONCLUSIONS

The SEL performance of the commercial ZBT SRAMs renders them unsuitable for many space applications.

The Virtex SRAM based FPGA is latch-up immune and was characterized extensively for SEU performance. The projected frequency of single event upset on-orbit is greater than many traditional ASIC technologies, nonetheless it is a good candidate for many applications such as remote sensing where some upsets can be tolerated in exchange for the dramatic increase in performance offered by Virtex.

The frequency of bit upsets can be tolerated through a combination of rapid detection and recovery, logic redundancy, and part redundancy if required. More investigations are being planned to test the effectiveness of these mitigation techniques in a system design.

Thermal management of these high performance parts is critical to on-orbit reliability.

Remote reconfigurability is clearly viable.
ACKNOWLEDGMENTS

The authors wish to thank Teratum Lowchareonkul and Rick Padovani of Xilinx for their tireless support of the Los Alamos effort, Steve Wallin of Los Alamos for his support at Texas A&M, and Mark Dunham of Los Alamos for his boundless and enthusiastic support of reconfigurable computing in space.

REFERENCES


