DOKK Library

Who ate my battery? Why software engineers are the key to low power software design

Authors Jeremy Bennett

License CC-BY-2.0

Plaintext
                  Who ate my battery?
Why software engineers are the key to low power software design

                Jeremy Bennett, CEO Embecosm
                        22 March 2012
                                                                                         Agenda



 Designing energy aware systems

 Hardware and software working together
   – unified system debug


 Experience with OpenRISC




          Copyright © 2012 Embecosm. Freely available under a Creative Commons license
 Designing Energy Aware Systems




Copyright © 2012 Embecosm. Freely available under a Creative Commons license
                                                                                       Why?




         Ericsson T65
– released 2001
– Li-Ion 720 mAh
– standby 300 h
– talk time 11 h
– includes talk/standby prediction

        Copyright © 2012 Embecosm. Freely available under a Creative Commons license
                                                                                       Why?




         Ericsson T65                             Sony Ericsson Xperia X10 mini
– released 2001                                     – released 2010
– Li-Ion 720 mAh                                    – Li-polymer 930 mAh
– standby 300 h                                     – standby up to 285 h (3G) / 360 h
– talk time 11 h                                      (2G)
– includes talk/standby prediction                  – talk time up to 4 h (2G) / 3.5 h (3G)


        Copyright © 2012 Embecosm. Freely available under a Creative Commons license
                                                 Free Mobile Apps
                                           “Drain Battery Faster”




Copyright © 2012 Embecosm. Freely available under a Creative Commons license
                                                          Energy Saving Today

 Largely in the realm of hardware engineering
 Hardware design aims to mimimize
   – static (leakage) power loss
   – dynamic (switching) power loss
 Techniques used
   – dynamic voltage and frequency scaling (DVFS)
   – multiple mode operation (standby, sleep, suspend, off)
 Scope for savings
   – P = V2R
   – on-chip voltage can range from ~0.5V to ~1.5V
   – lower frequencies mean lower voltages can be used
       win on both static and dynamic power loss
 Is this where the greatest savings can be made?

          Copyright © 2012 Embecosm. Freely available under a Creative Commons license
                            Greater Savings at Higher Levels




 Architectural



RTL Synthesis



         Gate



       Layout


             0%            20%           40%           60%            80%            100%

                                                                              Source: LSI Logic

      Copyright © 2012 Embecosm. Freely available under a Creative Commons license
                                           Shouldn't We Look Higher?




 A Linux implementation famously wasted 70-90% of its energy,
  simply waking several times a second to drive a blinking cursor.



 A project had to raise its clock frequency because a standard
  codec caused excessive processor stalls through cache conflicts.
  That project was cancelled shortly aftewards.




         Copyright © 2012 Embecosm. Freely available under a Creative Commons license
                                                      Focussing on Software

 Software controls the hardware
   – algorithms and data flow
   – compiler optimization traditionally speed over everything
 Few software engineers appreciate this
   – how does an algorithm affect the power consumption
   – power consumption is often a secondary design criterion in software
 Yet biggest savings are at the higher levels of abstraction
   – choice of algorithm
   – data handling
   – entire software stack
 Why?
   – energy is consumed by the hardware computations
   – but ultimate control of that hardware lies with the software


          Copyright © 2012 Embecosm. Freely available under a Creative Commons license
                            Greater Savings at Higher Levels



            Application

               Compiler

Programming Language

                      ISA

          Architecture

        RTL Synthesis

                     Gate

                  Layout

                         0%         20%         40%        60%         80%          100%

                                                                Source: Bennett & Eder, 2011

     Copyright © 2012 Embecosm. Freely available under a Creative Commons license
                                                                      This is Not New

Kaushik Roy and Mark C. Johnson. 1997. Software design for low
power. In Low power design in deep submicron electronics,
Wolfgang Nebel and Jean Mermet (Eds.). Kluwer Nato Advanced
Science Institutes Series, Vol. 337. Kluwer Academic Publishers,
Norwell, MA, USA, pp 433-460.

   Choose the best algorithm to fit the hardware
   Tune algorithms to manage memory size and memory access
   Optimize for performance, making best use of parallelism
   Use hardware support for power management
   Minimize CPU and data path switching in the generate code



          Copyright © 2012 Embecosm. Freely available under a Creative Commons license
                                            Hardware Power Analysis




 Late in the design flow
 Slow




                                                                             Source: Eder, 2011
        Copyright © 2012 Embecosm. Freely available under a Creative Commons license
                                      Software Power Analysis




                                                                     Source: Eder, 2011
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
                 Tractable Software Power Analysis




                                                                     Source: Eder, 2011
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
                                                              The Tool Challenge

 How do we do high level power estimation
 Existing power analysis tools
   – operate at gate level
   – accurate, but very slow
 Instruction level power analysis
   – Tiwari et al, 1996
   – highly parameterized formulaic approach
   – no data on accuracy
 Wattch: An architectural power analysis tool
   – Brooks et al, 2000
   – requires parameterized models of common functional blocks
   – accurate to ±10%, relatively fast



          Copyright © 2012 Embecosm. Freely available under a Creative Commons license
                                    What Have We Learned So Far?

 Use a low power instruction encoding
   – minimize Hamming distance between consecutive instructions
         reduces switching
   – ISA is very target application specific (requires profiling to create)
   – up to 62% reduction in opcode switching observed
   – Woo et al, 2001
 Partition the register file
   –   25% of registers account for 83% of access time
   –   partition into “hot” and “cold” registers
   –   average 54% energy savings compared to non-partitioned
   –   Guan & Fei, 2010




           Copyright © 2012 Embecosm. Freely available under a Creative Commons license
                                                       Higher Abstraction Still

 Specialist programming languages
   –   allow the programmer to exploit power/accuracy tradeoffs
   –   use approximate types where appropriate
   –   programmer annotates the code
   –   type checker separates out approximate and accurate code
   –   Sampson et al 2011
 Hardware support for approximate computation
   – variable bit width in floating point calculations
   – up to 66% power saving
   – Tong et al 2000
 These are niche examples. What is the generic solution
   – key is tools that allow software designers to explore solutions
   – profile energy usage as easily as performance
         tools from Lauterbach and research by Steve Kerrison at Bristol University

            Copyright © 2012 Embecosm. Freely available under a Creative Commons license
                                        The Energy Aware Computing
                                                    Initiative (EACO)

 Ultimate goal is a European wide program of research
 Led by the Institute for Advanced Studies at Bristol University
 Kicked off with three dedicated workshops in 2011
   – http://www.bristol.ac.uk/ias/workshops/current-workshops/energy-aware-computing.html

 Intellectual challenges
   – incremental improvements
   – radically new innovative approaches
 Conveners
   – Dr Kerstin Eder
   – Prof David May
 More collaborators wanted
   – contact Kerstin Eder



           Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Hardware and Software Working Together
         Unified System Debug




   Copyright © 2012 Embecosm. Freely available under a Creative Commons license
                                    A Typical SoC Design Flow



               Exploration &                                 System
                                   Implementation                                 Silicon
               System Design                               Verification


Hardware          SystemC            Simulation &           FPGA or
                                                                                  Silicon
Team                TLM                CA Model            Emulation




Software          ISS +                ISS +                ISS/ICE +            Silicon +
Team             Debugger             Debugger              Debugger            Debugger (?)




           Copyright © 2012 Embecosm. Freely available under a Creative Commons license
                                                       The Technical Solution

 Embedded software tools need two key features
   – they must be “peripheral aware”
        when the program halts, the peripherals halt
        the debugger has visibility into the peripherals
   – they must work with models as well as final silicon
        models of the complete SoC
        high level, low level, software or FPGA emulation


 This is not a technical challenge
   – most debuggers extend easily to peripherals
   – JTAG provides a good abstraction of the interface
   – the EDA world knows how to model SoCs




           Copyright © 2012 Embecosm. Freely available under a Creative Commons license
                                  Adding “Peripheral Awareness”


                                      (gdb) info spr picmr
                                      PIC.PICMR = SPR9_0 = 0 (0x0)
Reading peripherals
                                      (gdb)
–   GDB info command




                                      (gdb) set spr picmr 0x00000007
                                      PIC.PICMR (SPR9_0) set to 7 (0x7), was: 0 (0x0)
Writing peripherals
                                      (gdb)
–   GDB set command




                                      (gdb) pwatch picsr
Watchpoints                           Peripheral watchpoint 2: PIC.PICSR (SPR9_2)
–   new GDB command                   (gdb)
–   depends on target abilities




                  Copyright © 2012 Embecosm. Freely available under a Creative Commons license
                       Peripheral Awareness Using the GDB
                                    Remote Serial Protocol


 Extend using standard Remote Serial Protocol (RSP) features
   – A reliable packet based protocol


 qCmd packet used to access peripherals
   – e.g. readspr to read a peripheral register
   – e.g. writespr to write a peripheral register


 Future proof against GDB upgrades
   – RSP compatability is always ensured




          Copyright © 2012 Embecosm. Freely available under a Creative Commons license
                                              Debugging Models and Silicon

                                      class SocTlmModel
                                        : public sc_core::sc_module
SystemC TLM 2.0                       {
–   modeled as debug I/F                …
                                        tlm:tlm_transport_dbg_if<JtagPayload> jtagPort;



                                      class SocCycleModel
                                        : public sc_core::sc_module
Cycle accurate/simulation             {
–   modeled as pins                     …
                                        sc_in<bool> jtagTck;
                                        sc_in<bool> jtagTms;


                                      static void
                                      jp1_ll_reset_jp1()
FPGA/Emulation/ASIC                   {
–   drives hardware interface           …
                                        write (lp, &data, sizeof (data));
                                        JP1_WAIT ();




                  Copyright © 2012 Embecosm. Freely available under a Creative Commons license
                                               Unified Debug Solution


Firmware World                                                Hardware World
                                                                          Hand-written
                                              TLM 2.0                     TLM
                                            JTAG Model



                                                                          Simulation or
                                              JTAG                        CA model
                Debugger Protocol
                                            simulation
                 (e.g. GDB RSP)


 Debugger
 (e.g. GDB)                                                               Emulation
                                               JTAG
                                                                          or FPGA
                                               driver




                                                                          Silicon
                                               JTAG
                                               driver

              Unified JTAG Abstraction Layer

    Copyright © 2012 Embecosm. Freely available under a Creative Commons license
                                                                JTAG Abstraction
                                                            Generic Class Diagram


  RspServerSC                                                                 JtagSC




                                JtagRegister




A generic GDB Remote Serial Protocol server communicating with a generic JTAG target
by passing a generic JTAG register.

Both the RSP server and JTAG target are abstract classes. Concrete derived classes are
created for specific architectures and specific JTAG targets


                Copyright © 2012 Embecosm. Freely available under a Creative Commons license
                                                                        JTAG Abstraction
                                                                         Specific Targets


  RspServerSC                                                                 JtagSC




                                JtagRegister                   JtagTlmSC               JtagCycleModelSC




A set of JTAG derived classes provide interfaces                    JtagVpiSC                  Jtag2232SC
to common targets independent of the
processor architecture being supported.



                Copyright © 2012 Embecosm. Freely available under a Creative Commons license
                                                                   JTAG Abstraction
                                                                Specific Architecture


  RspServerSC                                                                 JtagSC




                                JtagRegister                   JtagTlmSC               JtagCycleModelSC




ArchRspServerSC              ArchJtagRegType
                              ArchJtagRegType                       JtagVpiSC                  Jtag2232SC
                               ArchJtagRegType




                Copyright © 2012 Embecosm. Freely available under a Creative Commons license
                                Using SystemC TLM to model JTAG

                                tlm_generic_payload           Use extensions for JTAG specific
                                                                Use extensions for JTAG specific
                                                               features
                                                                 features
                                                                 – address, bit length, command
                                                                   – address, bit length, command
                                   get_command ()            Allows use of generic payload
                                                              Allows use of generic payload
                                   set_command ()                – maximum portability
                                                                   – maximum portability
                                   get_address ()
                                   set_address ()
                                 get_data_length ()
                                 set_data_length ()
                                  get_extension ()
                                                                                0..1

           0..1                                                     JtagPayloadCommand

JtagPayloadAddress                            0..1                        IGNORE
                                                                           READ
      RESET                    JtagPayloadBitLength                       WRITE
       IR                                                               WRITE_READ
       DR
  getAddress ()                   getBitLength ()                     getCommand ()
  setAddress ()                   setBitLength ()                     setCommand ()




              Copyright © 2012 Embecosm. Freely available under a Creative Commons license
                                                                      Example Adapter

             TLM Thread                               Cycle Accurate Thread
Blocking
TLM call


                                TAP State Machine
                                                                              tck

                                                                              tdi
                                                                                       SystemC
                                                                              tdo
                                                                                       signals
                                                                              tms

                                                                              trst




Blocking
TLM return

                                                    Reference
                                                     ReferenceApplication
                                                               ApplicationNote
                                                                           NoteEAN5
                                                                                EAN5




              Copyright © 2012 Embecosm. Freely available under a Creative Commons license
        Experience with OpenRISC




Copyright © 2012 Embecosm. Freely available under a Creative Commons license
                                            The OpenRISC 1000 Project

 Objective to develop a family of open source RISC designs
   – 32 and 64-bit architectures
   – floating point support
   – vector operation support
 Key features
   –   fully free and open source
   –   linear address space
   –   register-to-register ALU operations
   –   two addressing modes
   –   delayed branches
   –   Harvard or Stanford memory MMU/cache architecture
   –   fast context switch
 Looks rather like MIPS or DLX

           Copyright © 2012 Embecosm. Freely available under a Creative Commons license
                                                             The OpenRISC 1200



                OpenRISC 1200                                      32-bit Harvard RISC architecture
    Power                                   Inst     W                –   MIPS/DLX like instruction set
    Mgmt                                    MMU      i                –   first in OpenRISC 1000 family
                                                     s
                                                     h                –   originally developed 1999-2001
                                                     B
J                                                    o             Open source under the
T   Debug                CPU                 Inst    n
A                                                    e                –   GNU Lesser General Public License
     Unit                                   Cache
G                                                                     –   allows reuse as a component
                         ALU
                                                                   Configurable design
     Tick                                   Data     W                –   caches and MMUs optional
    Timer                                   MMU      i                –   core instruction set
                                                     s
                                                     h             Source code Verilog 2001
                                                     B
                                                     o                –   approx 32k lines of code
                                            Data     n             Full GNU tool chain and Linux port
     PIC                                             e
                                            Cache
                                                                      –   various RTOS ported as well




           Copyright © 2012 Embecosm. Freely available under a Creative Commons license
                                                     Hardware Development

 Objective is to use an open source EDA tool chain
   – back end tools for FGPA all proprietary
        free (as in beer) versions available
   – front end tools now have open source alternatives
 OpenRISC 1000 simulation models
   – Or1ksim: golden reference ISS
        C/SystemC interpreting ISS, 2-5 MIPS
   – Verilator cycle accurate model from the Verilog RTL
        130kHz in C++ or SystemC
   – Icarus Verilog event driven simulation
        1.4kHz, 50x slower than commercial alternatives
 All OpenRISC 1000 simulation models suitable for SW use
   – all support GDB debug interface



           Copyright © 2012 Embecosm. Freely available under a Creative Commons license
                                                   The Software Tool Chain

 A standard GNU tool chain
   –   binutils 2.20.1
   –   gcc 4.5.1
   –   gdb 7.2
   –   C and C++ language support
 Library support
   – static libraries only
   – newlib 1.18.0 for bare metal (or32-elf-*)
   – uClibc 0.9.32 for Linux applications (or32-linux-*)
 Testing
   – regression tested using Or1ksim (both tool chains)
   – or32-linux-* regression tested on hardware
   – or32-elf-* regression tested on a Verilator model


           Copyright © 2012 Embecosm. Freely available under a Creative Commons license
                                                       Board and OS Support

 Boards with BSP implementations
   – Or1ksim
   – DE-nano
   – Terasic DE-2
 RTOS support
   – FreeRTOS, RTEMS and eCos all ported
 Linux support
   – adopted into Linux 3.1 kernel mainline
   – some limitations (kernel debug, ptrace)
   – BusyBox as application environment
 Debug interfaces
   – JTAG for bare metal
   – gdbserver over Ethernet for Linux applications


          Copyright © 2012 Embecosm. Freely available under a Creative Commons license
                                        Comparative Regression Testing
                                                 of the OpenRISC 1200

    Golden SystemC TLM Model

                    === gcc Summary ===

#   of   expected passes              52753
#   of   unexpected failures          152
#   of   expected failures            77
#   of   unresolved testcases         122
#   of   unsupported tests            716




                 Copyright © 2012 Embecosm. Freely available under a Creative Commons license
                                        Comparative Regression Testing
                                                 of the OpenRISC 1200

    Golden SystemC TLM Model                                Verilator SystemC RTL Model

                    === gcc Summary ===                                     === gcc Summary ===

#   of   expected passes              52753             #   of   expected passes                52677
#   of   unexpected failures          152               #   of   unexpected failures            228
#   of   expected failures            77                #   of   expected failures              77
#   of   unresolved testcases         122               #   of   unresolved testcases           122
#   of   unsupported tests            716               #   of   unsupported tests              716




                 Copyright © 2012 Embecosm. Freely available under a Creative Commons license
                                        Comparative Regression Testing
                                                 of the OpenRISC 1200

    Golden SystemC TLM Model                                Verilator SystemC RTL Model

                    === gcc Summary ===                                     === gcc Summary ===

#   of   expected passes              52753             #   of   expected passes                52677
#   of   unexpected failures          152               #   of   unexpected failures            228
#   of   expected failures            77                #   of   expected failures              77
#   of   unresolved testcases         122               #   of   unresolved testcases           122
#   of   unsupported tests            716               #   of   unsupported tests              716



 We can identify two types of problem
         – tests which fail due to timing out with RTL, but not due to slower model
         – tests which give a different result with RTL
 These are candidates for possible RTL errors
 Used commercially by Adapteva Inc
         – 50-60 RTL errors eliminated pre-tape out


                 Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Summary and Acknowledgements




Copyright © 2012 Embecosm. Freely available under a Creative Commons license
                                                                                   Summary

 Future low power products will require a systems approach
   – hardware and software engineers must work together
   – the approach applies throughout the lifecycle
 The greatest opportunity for power saving is in the software
   – techniques for tackling this are still in their infancy
   – we need breakthroughs in high level power modeling and simulation
 We need a systems oriented tool chain
   – geared to the needs of both software and hardware
   – usable throughout the product lifecycle
 Embecosm's unified debugging approach is an example
   – allows software debugging throughout the lifecycle
 The benefits can be seen already in the OpenRISC project
   – hardware bugs identified by the software engineers

          Copyright © 2012 Embecosm. Freely available under a Creative Commons license
                                                               Acknowlegements




 Most of the work described in the first section of this
  presentation on energy aware computing is due to my colleague,
  Dr Kerstin Eder of the University of Bristol.

 OpenRISC is a community project, to which I am just one of the
  contributors. It is the cumulative result of 12 years work by a
  very large number of people




         Copyright © 2012 Embecosm. Freely available under a Creative Commons license
                                                                   Thank You




                      Thank You

                     www.embecosm.com
                     www.opencores.org




Copyright © 2012 Embecosm. Freely available under a Creative Commons license