Description

Future SoCs will witness a continued exponential increase in transistor capacity resulting in a complexity and reliability problem. We propose to rededicate a fraction of the abundant transistor capacity of future SoCs to implement organic computing properties. The system will have an increased fault tolerance, increased performance and power efficiency, easier system diagnosis and the capability to autonomously adapt to changing environmental conditions be it either externally imposed workloads or temperature variations. The project will only extend current SoCs to allow the reuse of existing IP.

Such a conceptual shift in the approach to IC design requires a fresh and holistic view on the implications for SoC architecture platforms, the SoC design method and corresponding EDA design tools.

The Autonomic System on Chip (ASoC) project will investigate and develop:

  • A new SoC design method which is capable to deal with graceful degradation and redundancy in distributed integrated systems. It is able to model and optimize structural changes within the SoC architecture in the event of failure or suboptimal performance.
  • A corresponding ASoC architecture platform which extends the state-of-the-art IP library design approach towards self-organization and exploits the capabilities of the new design method and tools. We believe that the ASoC design method and ASoC architecture platform must be developed in a seamlessly interlocked manner (similar to microprocessor architecture and compiler co-design) in order to be effective.
  • A new on-chip component supervision and validation techniques which verifies individual and interdependent functional macros. Since functional test coverage of multi-hundred million transistors macros cannot be exhaustive, we need to search for efficient means to evaluate the dynamic behavior of SoC IP macros in real-time.
  • A dynamic and coupled power-performance management in SoCs
  • A flexible and dynamic hardware-software (HW-SW) repartitioning which allows an equivalent software process to replace defective hardware, or dynamically loaded hardware configurations to replace a low performing software process.

The ASoC will be self-organizing: it will continuously try to find by itself, without external intervention, the most suitable configuration for keeping the system in reliable, fault-free and functional operation while ensuring the best possible performance.

The ASoC will be self-healing: it attempts to replace a faulty processing unit with an equivalent counter part which will adopt the functionality of the failing element. The replacement unit can either be an idle stand-by element, or a processing unit that performs other tasks prior to the error occurrence.

The self-healing concept does not just mean to fix an error, but also to prevent errors, e.g. in cases where the system risks to get into a critical state (e.g. component overload or excessive power consumption). This implies that the ASoC must be able to supervise the behaviour of its constitutional components and build up fallback scenarios, which are activated under certain trigger conditions before system failure. The fallback solution just has to be good enough for respecting the constraints and providing a reasonable quality of service. Once a fallback solution has been deployed, the self-organization process will again try to improve performance and eventually switch back to the original system configuration.

The ASoC architecture platform related aspects of the project will be developed closely interlocked with the ASoC design methodology and corresponding tools aspects (similar to what is best practise in processor micro-architecture and compiler co-design). The to-be-developed tool will guide and support the ASoC design engineer with instantiating the right amount and right type of organic IP components which allows the designer to deal with the higher complexities. What follows are the individual goals for the ASoC architecture platform and the ASoC design methodology.