Xilinx product reliability exceeds industry requirements by applying knowledge-based qualification methodologies and demonstrating world class reliability results on leading-edge technology nodes. All products must meet unique and stringent quality and reliability exit-criteria (standards-based JEDEC, IPC, IEC, AEC, MIL-STD, knowledge-based methodology, machine-learning, etc.) before production release.
Reliability margins have been shrinking over the years. Generation after generation, Xilinx has been flattening the “shrinking bathtub” by addressing infant mortality / Early Life Failure (EFR), thus, reducing defect density (DD) through:
Figure 1. Shrinking Bathtub: High-volume consumer markets such as mobile communications are shrinking the IC reliability margin. By focusing on design reliability, Xilinx has been able to deliver and meet the reliability requirements.
Advanced leading-edge technologies have led to a need for evolutionary thinking around testing and the analysis of quality and reliability of components. Xilinx uses knowledge-based reliability qualification (KBQ) in addition to standards-based methods (JEDEC, AEC, MIL-STD, etc.) to combat the shrinking bathtub. Evolving risk mitigation via testing with deep and machine learning application based analysis for defect exclusion.
Reliability at the system-level requires key learning, knowledge of customer use conditions and failure mechanisms. Xilinx Volume-System-Test (VST) characterizes devices at the system-level, emulating typical application use conditions.
As part of our commitment to assure reliability, Xilinx exceeds the industry standards-based qualification requirements to understand the failure mechanism and evaluate the reliability margins before product release. The achievements are accomplished through focus on die, package, and test-level defect reduction initiatives. The success of these efforts can be seen both in the line quality measured at customer sites and in the low FIT rates published in the Xilinx reliability report.
Process Technology | FIT Rate |
---|---|
16nm | 8 |
20nm | 11 |
28nm | 11 |
40nm | 10 |
45nm | 11 |
65nm | 6 |
90nm | 2 |
Xilinx publishes device reliability monitor report to provide customers with insight regarding the reliability of Xilinx products. The goal of the reliability program is to achieve continuous improvement in the robustness of each product being evaluated. As part of this program, finished product reliability is measured continuously and periodically to ensure that the product performance meets or exceeds reliability specifications.
The Xilinx Reliability Estimator (XRE) tool was developed to help customers estimate the reliability performance and life time products based on customer mission profile and use conditions. Designed from the ground up, the calculator estimates the failure rates (FITs) for various customer-specified use conditions and durations.
The fundamental concepts of the XRE tool include:
Figure 1. Example of 28nm FIT Rate Calculation: The XRE tool takes into consideration the reliability device physics, along with the appropriate models and customer profiles to calculate an accurate FIT rate.
Ionizing radiation is capable of inducing undesired effects in most silicon devices. A single event upset (SEU), is an unintentional change of state caused by ionizing radiation in any integrated circuit, including ASIC, ASSP, FPGA, memory, logic, and mixed-signal devices.
Xilinx devices are designed to have an inherently low susceptibility to SEUs. Although SEUs are extremely rare and fully recoverable in Xilinx devices, Xilinx understands the need for the utmost in system reliability and availability, and that managing SEUs requires far more than simply estimating SEU Failures-In-Time (FIT). To that end, Xilinx provides system designers a comprehensive solution for SEU mitigation.
Xilinx SEU Solution
The foundation of reliability and availability is the silicon. Through continued innovation in circuit design and layout techniques, Xilinx has lowered the intrinsic SEU FIT of the silicon with each new generation, enabling most application deployments without any additional SEU mitigation. In addition, should an SEU occur, Xilinx provides rapid embedded error detection and correction that can restore the device state, such that the majority of SEUs will not result in system interruption.
Xilinx SEU FIT Trend
To maximize the integrity of designs in UltraScale devices, Xilinx offers industry-leading resilience to SEUs through more than 40 techniques spanning process, layout, circuit, and device architecture. Compared to 7 Series devices, UltraScale devices achieve up to a 3x reduction in SEU FIT.
UltraScale+ devices are the next step in Xilinx's continuing efforts to offer the most robust and comprehensive solution available. UltraScale+ devices contain additional design innovations and use FinFET transistor technology as a multiplier to gain substantial additional reduction in SEU FIT. Most applications will meet their reliability and availability requirements based on the inherent resilience of UltraScale+ devices without any additional SEU mitigation. Learn more about Xilinx’s SEU solutions for UltraScale+ devices.
Xilinx uses only ultra-low alpha (ULA) packaging materials and actively monitors material suppliers to ensure compliance with ULA specifications.
To effectively manage SEUs, Xilinx offers free IP that can be leveraged to increase reliability and availability in applications requiring additional mitigation.
The Soft Error Mitigation (SEM) IP core is an automatically configured, pre-verified solution to detect, correct, and classify SEUs in Configuration RAM of Xilinx devices. The SEM IP core does not prevent SEUs; however, it provides a method to better manage their system-level effects. Proper management of SEUs increases reliability and availability, and reduces system maintenance and downtime costs. The SEM IP core remains in pre-production status until it has been fully tested and qualified through accelerated particle testing at a radiation effects facility.
Additionally, the Block Memory Generator IP core and FIFO Generator IP core provide optimized solutions for common memory structures, with flow-through error correcting code (ECC) support to virtually eliminate the effects of SEUs in BlockRAM and UltraRAM resources. The Memory Interface Solutions (MIS) IP core similarly provides flow-through ECC support for external DDR3 and DDR4 SDRAM memory systems.
For applications demanding absolute safety or data integrity, Xilinx offers tools to assist in protection of critical design modules. The Isolation Design Flow (IDF) provides fault containment at the module level, enabling single-chip fault tolerance through supporting techniques such as redundancy, watchdog alarms, and logic segregation.
While optimization by EDA tools typically improves quality of results, these tools may also optimize away design-level SEU mitigation, such as redundant circuits or modules. Xilinx offers tools and a methodology to ensure mitigation techniques are left intact and design functionality is preserved.
Analysis and verification are the most critical pieces for ensuring reliability and availability. Xilinx takes an open and direct approach to assessing SEU FIT. Xilinx stands alone in the publication of radiation effects data for commercial devices, via the Xilinx Device Reliability Report, and uses this data to support pre-design and post-design SEU FIT estimation for reliability and availability analysis.
In order to foster independent verification by interested users and the broader radiation effects community, the SEM IP core optionally provides convenient error injection, a capability enhanced by availability of data files providing a map of Configuration RAM locations essential to a given design’s operation. Further, Xilinx hardware debug tools support device Configuration RAM read back for verification during radiation effects tests.
Despite delivering absolute quality with zero-defect targets at production, exceeding industry reliability and operating lifetimes, Xilinx applies continuous improvement on a daily-basis. It is in our DNA and mostly driven by stringent market and their longer life time reliability requirements.
Xilinx Continuous Improvement Action (CIA) eliminates causes of non-conformities to prevent recurrence. Automatic escalation process throughout the management chain ensures CIAs are addressed and closed. Preventive Action Request (PAR) & Material Review Board (MRB) systems detect and eliminate the cause of potential non-conformities to prevent occurrence or impact customers.
Figure 1. Over the last 8 years, RMAs have been reduced by >60% as a result of product quality, customer support, and direct engagement for issue resolution