Authors Jeremy Bennett
License CC-BY-2.0
Who ate my battery? Why software engineers are the key to low power software design Jeremy Bennett, CEO Embecosm 22 March 2012 Agenda Designing energy aware systems Hardware and software working together – unified system debug Experience with OpenRISC Copyright © 2012 Embecosm. Freely available under a Creative Commons license Designing Energy Aware Systems Copyright © 2012 Embecosm. Freely available under a Creative Commons license Why? Ericsson T65 – released 2001 – Li-Ion 720 mAh – standby 300 h – talk time 11 h – includes talk/standby prediction Copyright © 2012 Embecosm. Freely available under a Creative Commons license Why? Ericsson T65 Sony Ericsson Xperia X10 mini – released 2001 – released 2010 – Li-Ion 720 mAh – Li-polymer 930 mAh – standby 300 h – standby up to 285 h (3G) / 360 h – talk time 11 h (2G) – includes talk/standby prediction – talk time up to 4 h (2G) / 3.5 h (3G) Copyright © 2012 Embecosm. Freely available under a Creative Commons license Free Mobile Apps “Drain Battery Faster” Copyright © 2012 Embecosm. Freely available under a Creative Commons license Energy Saving Today Largely in the realm of hardware engineering Hardware design aims to mimimize – static (leakage) power loss – dynamic (switching) power loss Techniques used – dynamic voltage and frequency scaling (DVFS) – multiple mode operation (standby, sleep, suspend, off) Scope for savings – P = V2R – on-chip voltage can range from ~0.5V to ~1.5V – lower frequencies mean lower voltages can be used win on both static and dynamic power loss Is this where the greatest savings can be made? Copyright © 2012 Embecosm. Freely available under a Creative Commons license Greater Savings at Higher Levels Architectural RTL Synthesis Gate Layout 0% 20% 40% 60% 80% 100% Source: LSI Logic Copyright © 2012 Embecosm. Freely available under a Creative Commons license Shouldn't We Look Higher? A Linux implementation famously wasted 70-90% of its energy, simply waking several times a second to drive a blinking cursor. A project had to raise its clock frequency because a standard codec caused excessive processor stalls through cache conflicts. That project was cancelled shortly aftewards. Copyright © 2012 Embecosm. Freely available under a Creative Commons license Focussing on Software Software controls the hardware – algorithms and data flow – compiler optimization traditionally speed over everything Few software engineers appreciate this – how does an algorithm affect the power consumption – power consumption is often a secondary design criterion in software Yet biggest savings are at the higher levels of abstraction – choice of algorithm – data handling – entire software stack Why? – energy is consumed by the hardware computations – but ultimate control of that hardware lies with the software Copyright © 2012 Embecosm. Freely available under a Creative Commons license Greater Savings at Higher Levels Application Compiler Programming Language ISA Architecture RTL Synthesis Gate Layout 0% 20% 40% 60% 80% 100% Source: Bennett & Eder, 2011 Copyright © 2012 Embecosm. Freely available under a Creative Commons license This is Not New Kaushik Roy and Mark C. Johnson. 1997. Software design for low power. In Low power design in deep submicron electronics, Wolfgang Nebel and Jean Mermet (Eds.). Kluwer Nato Advanced Science Institutes Series, Vol. 337. Kluwer Academic Publishers, Norwell, MA, USA, pp 433-460. Choose the best algorithm to fit the hardware Tune algorithms to manage memory size and memory access Optimize for performance, making best use of parallelism Use hardware support for power management Minimize CPU and data path switching in the generate code Copyright © 2012 Embecosm. Freely available under a Creative Commons license Hardware Power Analysis Late in the design flow Slow Source: Eder, 2011 Copyright © 2012 Embecosm. Freely available under a Creative Commons license Software Power Analysis Source: Eder, 2011 Copyright © 2012 Embecosm. Freely available under a Creative Commons license Tractable Software Power Analysis Source: Eder, 2011 Copyright © 2012 Embecosm. Freely available under a Creative Commons license The Tool Challenge How do we do high level power estimation Existing power analysis tools – operate at gate level – accurate, but very slow Instruction level power analysis – Tiwari et al, 1996 – highly parameterized formulaic approach – no data on accuracy Wattch: An architectural power analysis tool – Brooks et al, 2000 – requires parameterized models of common functional blocks – accurate to ±10%, relatively fast Copyright © 2012 Embecosm. Freely available under a Creative Commons license What Have We Learned So Far? Use a low power instruction encoding – minimize Hamming distance between consecutive instructions reduces switching – ISA is very target application specific (requires profiling to create) – up to 62% reduction in opcode switching observed – Woo et al, 2001 Partition the register file – 25% of registers account for 83% of access time – partition into “hot” and “cold” registers – average 54% energy savings compared to non-partitioned – Guan & Fei, 2010 Copyright © 2012 Embecosm. Freely available under a Creative Commons license Higher Abstraction Still Specialist programming languages – allow the programmer to exploit power/accuracy tradeoffs – use approximate types where appropriate – programmer annotates the code – type checker separates out approximate and accurate code – Sampson et al 2011 Hardware support for approximate computation – variable bit width in floating point calculations – up to 66% power saving – Tong et al 2000 These are niche examples. What is the generic solution – key is tools that allow software designers to explore solutions – profile energy usage as easily as performance tools from Lauterbach and research by Steve Kerrison at Bristol University Copyright © 2012 Embecosm. Freely available under a Creative Commons license The Energy Aware Computing Initiative (EACO) Ultimate goal is a European wide program of research Led by the Institute for Advanced Studies at Bristol University Kicked off with three dedicated workshops in 2011 – http://www.bristol.ac.uk/ias/workshops/current-workshops/energy-aware-computing.html Intellectual challenges – incremental improvements – radically new innovative approaches Conveners – Dr Kerstin Eder – Prof David May More collaborators wanted – contact Kerstin Eder Copyright © 2012 Embecosm. Freely available under a Creative Commons license Hardware and Software Working Together Unified System Debug Copyright © 2012 Embecosm. Freely available under a Creative Commons license A Typical SoC Design Flow Exploration & System Implementation Silicon System Design Verification Hardware SystemC Simulation & FPGA or Silicon Team TLM CA Model Emulation Software ISS + ISS + ISS/ICE + Silicon + Team Debugger Debugger Debugger Debugger (?) Copyright © 2012 Embecosm. Freely available under a Creative Commons license The Technical Solution Embedded software tools need two key features – they must be “peripheral aware” when the program halts, the peripherals halt the debugger has visibility into the peripherals – they must work with models as well as final silicon models of the complete SoC high level, low level, software or FPGA emulation This is not a technical challenge – most debuggers extend easily to peripherals – JTAG provides a good abstraction of the interface – the EDA world knows how to model SoCs Copyright © 2012 Embecosm. Freely available under a Creative Commons license Adding “Peripheral Awareness” (gdb) info spr picmr PIC.PICMR = SPR9_0 = 0 (0x0) Reading peripherals (gdb) – GDB info command (gdb) set spr picmr 0x00000007 PIC.PICMR (SPR9_0) set to 7 (0x7), was: 0 (0x0) Writing peripherals (gdb) – GDB set command (gdb) pwatch picsr Watchpoints Peripheral watchpoint 2: PIC.PICSR (SPR9_2) – new GDB command (gdb) – depends on target abilities Copyright © 2012 Embecosm. Freely available under a Creative Commons license Peripheral Awareness Using the GDB Remote Serial Protocol Extend using standard Remote Serial Protocol (RSP) features – A reliable packet based protocol qCmd packet used to access peripherals – e.g. readspr to read a peripheral register – e.g. writespr to write a peripheral register Future proof against GDB upgrades – RSP compatability is always ensured Copyright © 2012 Embecosm. Freely available under a Creative Commons license Debugging Models and Silicon class SocTlmModel : public sc_core::sc_module SystemC TLM 2.0 { – modeled as debug I/F … tlm:tlm_transport_dbg_if<JtagPayload> jtagPort; class SocCycleModel : public sc_core::sc_module Cycle accurate/simulation { – modeled as pins … sc_in<bool> jtagTck; sc_in<bool> jtagTms; static void jp1_ll_reset_jp1() FPGA/Emulation/ASIC { – drives hardware interface … write (lp, &data, sizeof (data)); JP1_WAIT (); Copyright © 2012 Embecosm. Freely available under a Creative Commons license Unified Debug Solution Firmware World Hardware World Hand-written TLM 2.0 TLM JTAG Model Simulation or JTAG CA model Debugger Protocol simulation (e.g. GDB RSP) Debugger (e.g. GDB) Emulation JTAG or FPGA driver Silicon JTAG driver Unified JTAG Abstraction Layer Copyright © 2012 Embecosm. Freely available under a Creative Commons license JTAG Abstraction Generic Class Diagram RspServerSC JtagSC JtagRegister A generic GDB Remote Serial Protocol server communicating with a generic JTAG target by passing a generic JTAG register. Both the RSP server and JTAG target are abstract classes. Concrete derived classes are created for specific architectures and specific JTAG targets Copyright © 2012 Embecosm. Freely available under a Creative Commons license JTAG Abstraction Specific Targets RspServerSC JtagSC JtagRegister JtagTlmSC JtagCycleModelSC A set of JTAG derived classes provide interfaces JtagVpiSC Jtag2232SC to common targets independent of the processor architecture being supported. Copyright © 2012 Embecosm. Freely available under a Creative Commons license JTAG Abstraction Specific Architecture RspServerSC JtagSC JtagRegister JtagTlmSC JtagCycleModelSC ArchRspServerSC ArchJtagRegType ArchJtagRegType JtagVpiSC Jtag2232SC ArchJtagRegType Copyright © 2012 Embecosm. Freely available under a Creative Commons license Using SystemC TLM to model JTAG tlm_generic_payload Use extensions for JTAG specific Use extensions for JTAG specific features features – address, bit length, command – address, bit length, command get_command () Allows use of generic payload Allows use of generic payload set_command () – maximum portability – maximum portability get_address () set_address () get_data_length () set_data_length () get_extension () 0..1 0..1 JtagPayloadCommand JtagPayloadAddress 0..1 IGNORE READ RESET JtagPayloadBitLength WRITE IR WRITE_READ DR getAddress () getBitLength () getCommand () setAddress () setBitLength () setCommand () Copyright © 2012 Embecosm. Freely available under a Creative Commons license Example Adapter TLM Thread Cycle Accurate Thread Blocking TLM call TAP State Machine tck tdi SystemC tdo signals tms trst Blocking TLM return Reference ReferenceApplication ApplicationNote NoteEAN5 EAN5 Copyright © 2012 Embecosm. Freely available under a Creative Commons license Experience with OpenRISC Copyright © 2012 Embecosm. Freely available under a Creative Commons license The OpenRISC 1000 Project Objective to develop a family of open source RISC designs – 32 and 64-bit architectures – floating point support – vector operation support Key features – fully free and open source – linear address space – register-to-register ALU operations – two addressing modes – delayed branches – Harvard or Stanford memory MMU/cache architecture – fast context switch Looks rather like MIPS or DLX Copyright © 2012 Embecosm. Freely available under a Creative Commons license The OpenRISC 1200 OpenRISC 1200 32-bit Harvard RISC architecture Power Inst W – MIPS/DLX like instruction set Mgmt MMU i – first in OpenRISC 1000 family s h – originally developed 1999-2001 B J o Open source under the T Debug CPU Inst n A e – GNU Lesser General Public License Unit Cache G – allows reuse as a component ALU Configurable design Tick Data W – caches and MMUs optional Timer MMU i – core instruction set s h Source code Verilog 2001 B o – approx 32k lines of code Data n Full GNU tool chain and Linux port PIC e Cache – various RTOS ported as well Copyright © 2012 Embecosm. Freely available under a Creative Commons license Hardware Development Objective is to use an open source EDA tool chain – back end tools for FGPA all proprietary free (as in beer) versions available – front end tools now have open source alternatives OpenRISC 1000 simulation models – Or1ksim: golden reference ISS C/SystemC interpreting ISS, 2-5 MIPS – Verilator cycle accurate model from the Verilog RTL 130kHz in C++ or SystemC – Icarus Verilog event driven simulation 1.4kHz, 50x slower than commercial alternatives All OpenRISC 1000 simulation models suitable for SW use – all support GDB debug interface Copyright © 2012 Embecosm. Freely available under a Creative Commons license The Software Tool Chain A standard GNU tool chain – binutils 2.20.1 – gcc 4.5.1 – gdb 7.2 – C and C++ language support Library support – static libraries only – newlib 1.18.0 for bare metal (or32-elf-*) – uClibc 0.9.32 for Linux applications (or32-linux-*) Testing – regression tested using Or1ksim (both tool chains) – or32-linux-* regression tested on hardware – or32-elf-* regression tested on a Verilator model Copyright © 2012 Embecosm. Freely available under a Creative Commons license Board and OS Support Boards with BSP implementations – Or1ksim – DE-nano – Terasic DE-2 RTOS support – FreeRTOS, RTEMS and eCos all ported Linux support – adopted into Linux 3.1 kernel mainline – some limitations (kernel debug, ptrace) – BusyBox as application environment Debug interfaces – JTAG for bare metal – gdbserver over Ethernet for Linux applications Copyright © 2012 Embecosm. Freely available under a Creative Commons license Comparative Regression Testing of the OpenRISC 1200 Golden SystemC TLM Model === gcc Summary === # of expected passes 52753 # of unexpected failures 152 # of expected failures 77 # of unresolved testcases 122 # of unsupported tests 716 Copyright © 2012 Embecosm. Freely available under a Creative Commons license Comparative Regression Testing of the OpenRISC 1200 Golden SystemC TLM Model Verilator SystemC RTL Model === gcc Summary === === gcc Summary === # of expected passes 52753 # of expected passes 52677 # of unexpected failures 152 # of unexpected failures 228 # of expected failures 77 # of expected failures 77 # of unresolved testcases 122 # of unresolved testcases 122 # of unsupported tests 716 # of unsupported tests 716 Copyright © 2012 Embecosm. Freely available under a Creative Commons license Comparative Regression Testing of the OpenRISC 1200 Golden SystemC TLM Model Verilator SystemC RTL Model === gcc Summary === === gcc Summary === # of expected passes 52753 # of expected passes 52677 # of unexpected failures 152 # of unexpected failures 228 # of expected failures 77 # of expected failures 77 # of unresolved testcases 122 # of unresolved testcases 122 # of unsupported tests 716 # of unsupported tests 716 We can identify two types of problem – tests which fail due to timing out with RTL, but not due to slower model – tests which give a different result with RTL These are candidates for possible RTL errors Used commercially by Adapteva Inc – 50-60 RTL errors eliminated pre-tape out Copyright © 2012 Embecosm. Freely available under a Creative Commons license Summary and Acknowledgements Copyright © 2012 Embecosm. Freely available under a Creative Commons license Summary Future low power products will require a systems approach – hardware and software engineers must work together – the approach applies throughout the lifecycle The greatest opportunity for power saving is in the software – techniques for tackling this are still in their infancy – we need breakthroughs in high level power modeling and simulation We need a systems oriented tool chain – geared to the needs of both software and hardware – usable throughout the product lifecycle Embecosm's unified debugging approach is an example – allows software debugging throughout the lifecycle The benefits can be seen already in the OpenRISC project – hardware bugs identified by the software engineers Copyright © 2012 Embecosm. Freely available under a Creative Commons license Acknowlegements Most of the work described in the first section of this presentation on energy aware computing is due to my colleague, Dr Kerstin Eder of the University of Bristol. OpenRISC is a community project, to which I am just one of the contributors. It is the cumulative result of 12 years work by a very large number of people Copyright © 2012 Embecosm. Freely available under a Creative Commons license Thank You Thank You www.embecosm.com www.opencores.org Copyright © 2012 Embecosm. Freely available under a Creative Commons license