Close Enough to Correct: Approximate Computing

CTN Issue: May 2017

With the release of the Google Tensor Processing Unit paper we discovered that the cloud services folks were seriously thinking about fixed point computation. This is interesting because the DSP world, including the part that serves the wireless space, has made a significant effort to move towards floating point over the last decade. We could discuss the reasons for this switch at length but it is clear that it mostly comes down to ease of programming and a desire to minimize the power and area of computation. So this month I wanted to examine other ways of being approximately correct and saving some power and area in the process, aka approximate computing. In this article Prof. Ben Schaefer of UT Dallas will give us a quick look at how things work and the risks involved in ripping out gates and wires in an attempt to save some power. Your comments and questions are, as always, welcome.

Alan Gatherer EIC

Approximate Computing: Mitigating the Risk of Dynamic Workloads

Dr. Benjamin Carrion Schaefer, Assistant Professor at the Department of Electrical and Computer Engineering, University of Texas at Dallas

Gene Frantz
Dr. Benjamin
Carrion Schaefer

With the breakdown of Dennard’s scaling, power densities in today’s Integrated Circuits (ICs) are rapidly reaching unmanageable levels. The term Dark Silicon has been coined to describe the part of the circuitry of an IC that cannot be powered for a given thermal design constraint [1]. At the same time users are demanding more powerful portable devices and expect extended battery life.

As we reach the physical limits of CMOS technology, this implies that new solutions are needed to address the energy efficiency problem, or as in the case of approximate computing, extend old solutions to new problems.

Many classes of applications exhibit significant tolerance to inaccuracies in their computations. Some examples include image processing, multimedia applications and machine learning. These inaccuracies can be exploited to build circuits with smaller area, lower power and higher performance. Fig. 1 shows an example of the exact and different approximate solutions of the sobel edge detection algorithm.

Approximate computing is nevertheless not a new discipline. The numerical analysis discipline is built around the concept of approximate computing and has for many centuries addressed the question of how accurate a numerical solution is and how stable it is. Charles Babbage (1792-1891) aware of the inaccuracy of human calculations is credited for inventing the principle of the analytical machine (forerunner of the modern computer).

Figure 1. Sobel Edge Detection. Exact vs. different approximate solutions.

Figure 1. Sobel Edge Detection. Exact vs. different approximate solutions.

The DSP and VLSI design community is well aware of this and has widely study the effect of bit width reduction, saturations, rounding, and floating point to fixed point conversions to exactly do this. Most commercial High-Level Synthesis (HLS) tools (convert C to Verilog) already include these optimizations, where designers only need to specify the bit widths at the primary inputs and outputs and the tools optimize the internal bit widths accordingly.

As in many disciplines, the need to address a particular challenge has intensified the pursuit for better solutions. In the case of approximate computing, the number, type and application level of approximations has skyrocketed, reaching from the software level [2], the behavioral level (HLS) [3], the implementation [4], and physical level [5]. A good survey of these different techniques can be found at [6-8].

In the case of the edge detection example shown in Fig. 1, the approximate results were obtained by using different types of approximate address. In particular adders shown in Fig.2.

Figure 2. Schematics and Truth tables of different approximate full adder.

Figure 2. Schematics and Truth tables of different approximate full adder.

One of the problems of approximate computing is that the resultant approximate circuit is highly dependent on the training data. It has been shown that workloads in modern MPSoC-based systems are becoming increasingly dynamic, which cause changes in the nature of the workload over time [9]. This poses a significant risk to the IC design companies. Approximate methods and architectures are required to aid designers in the creation of approximate circuits that are stable enough to account for these workload changes. Developers need to know the risk they are taken when choosing a more or less aggressive approximate circuit.

This lack of well-defined risks models as well as the uncertainty about future workloads is hampering the wider adoption of more aggressive approximate circuits, which can are in turn much more energy efficient.

Fig. 3 shows trade-off curves of approximate micro-architectures obtained by applying different well known approximation optimizations, like variable to variable, variable to constant substitutions, and  approximate functional units to a set of SystemC designs obtained from the open source Synthesizable SystemC benchmarks suite (S2CBench [10]).

Figure 3. Area vs. Mean Absolute Average Error (MAPE) for approximate micro-architectures when training set match and does not match final workload.

Figure 3. Area vs. Mean Absolute Average Error (MAPE) for approximate micro-architectures when training set match and does not match final workload.

The two trade-off curves represent, for different approximate micro-architectures, the mean absolute average error (MAPE), when the final workload matches the training data and when it does not (in this case random, normal, positive and negative skew input data distributions were used). It can be observed that very good results can be obtained when the training distributions match the final workload, but that the error typically more doubles if it does not. The differences between the two curves of the same micro-architecture can be considered as the risk that the company will take if the final workload has not been fully characterized or will change in the future.

Different methods are required to mitigate, or at least control, this risk. Some examples include making use of reconfigurable logic, which can update the approximate micro-architecture when the workload changes, i.e. runtime reconfigurable hardware accelerators like Renesas Electronics Stream Transpose Processor (STP) [11], or developing hybrid architectures which can have an exact and an approximate operation mode. Moreover a wider-cross disciplinary multi-layer approach is necessary. There is enormous potential in approximate computing in the hardware-software co-design area, where applications need to communicate with the hardware the desired accuracy and the hardware makes runtime decisions to achieve the desired accuracy while maximizing energy-efficiency.

In the field of wireless communications programmable DSP has moved towards floating point but with 5G we have seen an uptick in the use of FPGA again. Both FPGA and hardware acceleration could benefit from approximate computing if the workloads are well understood, which they generally are. Circuits could be reprogrammed on the fly depending on the QAM constellation used, some measure of the channel conditions, or the known dataflow of the radio access technology, to take advantage of the right kind of approximate computing that minimizes the power in the circuitry, allowing the system to behave differently for IoT compared to eMBB for instance. In Figure 3 we see several common communications functions that appear to take good advantage of the approximate computation trade off of power versus performance and we believe that this technology could be part of the solution to get 5G to its power goals.

References

  1. M. Taylor, A landscape of the new dark silicon design regime, Micro, IEEE, Sept-Oct. 2013.
  2. M. Samadi, J. Lee, D. A. Jamshidi, A. Hormati, andS. Mahlke,SAGE: Self-tuning approximation for graphics engines, in Proc. Int. Symp. Microarchitect. ,2013, pp. 13–24.
  3. K. Nepal, Y. Li, R. I. Bahar, and S. Reda, ABACUS: A technique for automated behavioral synthesis of approximate computing circuits, in DATE, pp. 1–6, March 2014.
  4. V. Gupta, D. Mohapatra, S. P. Park, A. Raghunathan, and K. Roy, Impact: Imprecise adders for low-power approximate computing, in Low Power Electronics and Design (ISLPED) 2011 International Symposium on, pp. 409–414, Aug 2011.
  5. Y. Wu, C. Shen, Y. Jia and W. Qian, "Approximate logic synthesis for FPGA by wire removal and local function change," 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC), Chiba, 2017, pp. 163-169.
  6. Q. Xu, T. Mytkowicz, and N. S. Kim, “Approximate computing: A survey,” IEEE Design Test, vol. 33, pp. 8–22, Feb 2016.
  7. S. Mittal, “A survey of techniques for approximate computing,” ACM Comput. Surv., vol. 48, pp. 62:1–62:33, Mar. 2016.
  8. J. Han and M. Orshansky, Approximate computing: An emerging paradigm for energy-efficient design, pp. 1–6, IEEE Computer Society, 2013.
  9. P. van Stralen and A. Pimentel, Scenario-based design space exploration of MPSoC, in 2010 IEEE International Conference on Computer Design, pp. 305–312, Oct 2010.
  10. S2Cbench v.1.2, https://sourceforge.net/projects/s2cbench/
  11. Renesas Electronics, Stream Transpose Processor, https://www.renesas.com/en-us/products/programmable.html.

Leave a comment

Statements and opinions given in a work published by the IEEE or the IEEE Communications Society are the expressions of the author(s). Responsibility for the content of published articles rests upon the authors(s), not IEEE nor the IEEE Communications Society.