## Annex D – Additional diagrams and figures

Table I. Additional examples of state-of-the-art deep neural networks with related accuracy and hardware cost in GPU implementations.

| deep neural network                                 | task                                                                     | accuracy               | memory, computational effort                                                                                |  |  |  |  |  |
|-----------------------------------------------------|--------------------------------------------------------------------------|------------------------|-------------------------------------------------------------------------------------------------------------|--|--|--|--|--|
| ResNet [HZR2016]                                    | 10 million images from 1,000 categories                                  | 3.57% top-5 error rate | 12GB memory requirement, 0.5 seconds/image in best-in-class GPUs                                            |  |  |  |  |  |
| ResNet + Faster R-CNN<br>[HZR2016]                  | detect objects from 20<br>different categories                           | 83.8%                  | 12-GB GPU memory and 0.5<br>seconds/object detection on 256x256<br>images (similar to [DLH2016], [LAE2015]) |  |  |  |  |  |
| DeepLab-V2 [CPK2016]                                | segmenting images<br>containing various<br>objects from 20<br>categories | 79.7%                  | 6-GB memory cost and 2 seconds/image with best-in-class GPUs                                                |  |  |  |  |  |
| VGG16 [ZLL2016]                                     | pedestrian detection                                                     | 9.6% miss rate         | 6-GB memory and 0.5 seconds/pedestrian                                                                      |  |  |  |  |  |
| multi-domain network<br>[NH2016]                    | tracks objects across video frames                                       | 3% error rate          | 6-GB memory and 1 frame/s in best-in-<br>class GPUs                                                         |  |  |  |  |  |
| RCNN-based face<br>detector [CHW2016],<br>[WOJ2015] | detects and align faces                                                  | 98.35%                 | 6-GB memory and 30 images/s in best-in-<br>class GPUs                                                       |  |  |  |  |  |
| GoogleNet [SKP2015],<br>[WZL2016]                   | face verification                                                        | 99.63%                 | 6-GB memory, 20 images/s rate in best-<br>in-class GPUs                                                     |  |  |  |  |  |
| GoogleNet [ZLL2016],<br>[ZGW2016]                   | facial emotion classification                                            | 97.3%                  | 6-GB memory, 20 images/s rate in best-<br>in-class GPUs                                                     |  |  |  |  |  |



Internet of (visual) Things



object detection/classification, visual indexing (heavy loads, vehicles...)



crowd volume/motion monitoring for predictive infrastructure/access mgmt



warehouse management



augmented-reality information-enriched surveillance



object removal detection



targeted human search, person/face recognition, occupancy monitoring



ubiquitous surveillance (abandoned object detection...)



human activity monitoring (congestion, congregation, loitering)



neighborhoods



intelligent/predictive transportation (MRT...)



vehicle/pedestrian danger prediction (e.g., collision)



crowd sentiment monitoring (normal, frantic, panicking...)



vision in autonomous ultralightweight UAVs



monitoring for high-productivity manufacturing

Fig. D1. Societal impact of CogniVision: examples of applications that are enabled by (or benefit from) cognitive cameras.



Fig. D2. Memory size and power requirements of GPU-scale and CogniVision chip-scale vision and example (face recognition).



Fig. D3. The industrial interest in embedded vision is growing rapidly, as testified by the large number of enterprises that joined the Embedded Vision Alliance [EVA].



Fig. D4. General architectures for untethered cameras (CogniVision adopts the "cognitive&attentive" architecture to drastically reduce the radio-frequency transmitted power, and enables continuous responsiveness to the cloud requests through ultra-low power always-on receiver).



Fig. D5a. Power consumption of state-of-the-art imagers for mobile applications and additional wireless power (assuming an optimistic 5 nJ/bit - representative of best-in-class radios [ITT16]). In these plots, for fair comparison the power of imagers is scaled to VGA format at 30 frame/second by optimistically retaining the same energy/pixel at such requirements.



Fig. D5b. Power consumption of state-of-the-art ultra-low power imagers for always-on cameras and additional wireless power. As a result, the architecture #1 in Fig. 3 is unsuitable for sub-mW power budget.



Fig. D5c. Power consumption of state-of-the-art imagers with ultra-low power multi-mode (e.g., high-accuracy mode activated only if illumination changes) and additional wireless power. Again, the architecture #1 in Fig. 3 is unsuitable for sub-mW power budget.



Fig. D5d. Power consumption of state-of-the-art imagers with some limited form of sensemaking (e.g., undetected motion inhibits image sensing and reduces power) and additional wireless power. Again, the architecture #1 in Fig. 3 is unsuitable for sub-mW power budget.



Fig. D6. The power consumption of state-of-the-art image compression accelerators (e.g., MPEG) alone exceeds the power target of untethered cameras. As a result, the architecture #2 in Fig. 3 is unsuitable for sub-mW power budget.



# power consumption of state-of-the-art engines for scene sensemaking (all scaled to VGA, 30fps for fair comparison)

Fig. D7. The power consumption of state-of-the-art engines for sensemaking (e.g., deep learning, object recognition) alone exceeds the power target of untethered cameras. As a result, the architecture #3 in Fig. 3 based on existing stand-alone components is unsuitable for sub-mW power budget (i.e., system co-design is necessary to further reduce power).



Fig. D8. Integrated research prototypes: power consumption in complete and vision systems is invariably beyond 10mW when fairly scaled at same VGA resolution and 30fps framerate. The demonstrations that are in the mW have very limited computation-ability (tens of MOPS, compared to the targeted 20,000MOPS), which only allows for shooting a picture upon the occurrence of simple events. CogniVision aims to fill this gap, allowing mW power while assuring suitability for a wide range of applications, as permitted by the reprogrammable deep learning accelerator and the adequate throughput to complete meaningful vision tasks at the targeted 30fps frame rate.



Fig. D9. PCB-assembled research prototypes: cameras in real conditions consume a power that is much larger than 1 mW, and hence unsuited for energy-autonomous cameras.



Fig. D10. Battery lifetime of untethered cameras (PCB-assembled). Most of them were released during the review of the white paper of this proposal, showing a very broad interest in untethered cameras. Their lifetime is reported as per their datasheet or based on Amazon users' reviews where available. Their lifetime is definitely inadequate for distributed sensing, and justifies the "CogniVision" research program, which aims to enable nearly-perpetual lifetime via energy harvesting in a small form factor (<<100 mm<sup>3</sup>).

Table II. Leading researchers and their research work in areas affine to the "CogniVision" program (prominent researchers highlighted in bold).\_\_\_\_\_

| searcher              | s highlighted in bold).                                |                                | •                                           |                                                                                                                                                                                                                                                                             |
|-----------------------|--------------------------------------------------------|--------------------------------|---------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|                       | institution / company                                  | leading<br>researchers         | publication<br>samples                      | research scope and limitations                                                                                                                                                                                                                                              |
|                       | University of<br>Michigan,<br>Ann Arbor (USA)          | E. Yoon                        | [CPC15],<br>[CPC14],<br>[CPC12],<br>[CHK07] | adaptive imagers with some embedded<br>low-level intelligence, but ignores the<br>fundamental problem of very large<br>wireless power                                                                                                                                       |
|                       | HKUST (Hong-<br>Kong)                                  | A. Bermak                      | [TCW13],<br>[LBS11],<br>[CBW11],<br>[TCB10] | sub-mW power achieved only in imagers<br>with extremely poor resolution (e.g., 128<br>× 96 pixels), inadequate in real<br>applications                                                                                                                                      |
| a-low power IIIIagers | University of<br>Michigan,<br>Ann Arbor (USA)          | D. Blaauw, D.<br>Sylvester     | [KLF14],<br>[KBF13],<br>[HFB10],<br>[HS09]  | ultra-low power achieved only in imagers<br>with very low resolution (e.g., 128 × 128<br>pixels) and frame rate (e.g., 0.5 fps), and<br>very limited processing for event-<br>triggered picture shooting (tens of<br>MOPS), both inadequate in targeted<br>applications     |
|                       | NTU (Singapore)                                        | S. Chen, KS.<br>Low, H. Zhuang | [ZZC12]                                     | ultra-low power achieved only in imagers<br>with very low resolution (e.g., 64 × 64<br>pixels), sensemaking heavily constrained<br>by the event-driven sensing framework<br>(deep learning and state-of-the-art video<br>processing algorithms cannot be applied)           |
|                       | Samsung Advanced<br>Institute of<br>Technology (Korea) | DS. Park                       | [CSK15]                                     | multi-mode imagers, but ignores the<br>fundamental problem of very large<br>wireless power                                                                                                                                                                                  |
|                       | University of Idaho<br>(USA)                           | S. U. Ay                       | [A11], [A11b]                               | ultra-low power achieved only in imagers<br>with extremely poor resolution (e.g., 50 ×<br>50 pixels), ignores the fundamental<br>problem of very large wireless power                                                                                                       |
|                       | Purdue University<br>(USA)                             | E. Culurciello                 | [CTZ12]                                     | Includes ultra-low power radio, but power<br>is still 10X larger than needed; ultra-low<br>power achieved only in imagers with<br>extremely poor resolution (e.g., 64 × 64<br>pixels), inadequate in targeted<br>applications                                               |
|                       | Johns Hopkins<br>University (USA)                      | R. Etienne-<br>Cummings        | [CMC07]                                     | ultra-low power achieved only in imagers<br>with extremely poor resolution (e.g., 90 ×<br>90 pixels), sensemaking heavily<br>constrained by the event-driven sensing<br>framework (deep learning and state-of-<br>the-art video processing algorithms<br>cannot be applied) |
|                       | FBK (Italy)                                            | M. Gottardi                    | [GMJ09]                                     | ultra-low power achieved only in imagers<br>with extremely poor resolution (e.g., 128<br>× 64 pixels), sensemaking heavily<br>constrained by the event-driven sensing                                                                                                       |

|                                          |                                                        |                          |                                                                                                               | framework (deep learning and state-of-<br>the-art video processing algorithms<br>cannot be applied)                                                                                                                                                                           |
|------------------------------------------|--------------------------------------------------------|--------------------------|---------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|                                          | ARC-sr (Austria)                                       | T. Delbruck              | [LPD08]                                                                                                       | ultra-low power achieved only in imagers<br>with extremely poor resolution (e.g., 128<br>× 128 pixels), sensemaking heavily<br>constrained by the event-driven sensing<br>framework (deep learning and state-of-<br>the-art video processing algorithms<br>cannot be applied) |
|                                          | UC Louvain<br>(Belgium)                                | D. Bol, N. Couniot,      | [BDB14]                                                                                                       | focused on imager, ignores the<br>fundamental problem of very large<br>wireless power                                                                                                                                                                                         |
|                                          | NTHU (Taiwan)                                          | CC. Hsieh                | [CLY13]                                                                                                       | focused on imager, ignores the<br>fundamental problem of very large<br>wireless power                                                                                                                                                                                         |
|                                          | Yonsei University<br>(Korea)                           | J. Lee, G. Han           | [CLL10]                                                                                                       | focused on imager, ignores the<br>fundamental problem of very large<br>wireless power                                                                                                                                                                                         |
|                                          | Nara Institute of<br>Science and<br>Technology (Japan) | M. Nunoshita, J.<br>Ohta | [KSN08]                                                                                                       | focused on imager, ignores the<br>fundamental problem of very large<br>wireless power, ultra-low power achieved<br>only in imagers with extremely poor<br>resolution (e.g., 128x96 pixels)                                                                                    |
|                                          | Himax<br>Technologies, Inc.<br>(Taiwan)                | N/A                      | [HM16]                                                                                                        | ultra-low camera power only in<br>environments with virtually no motion in<br>the scene, unsuitable for public spaces<br>(excessive power)                                                                                                                                    |
|                                          | OmniVision<br>Technologies, Inc.<br>(USA)              | N/A                      | [OV15]                                                                                                        | ultra-low camera power only in<br>environments with low/steady lighting<br>and no motion in the scene, unsuitable<br>for public spaces (excessive power)                                                                                                                      |
|                                          | Gdansk University<br>of Technology<br>(Poland)         | R. Piotrowski            | [JBJ13]                                                                                                       | Low-power low-resolution imagers with<br>on-chip low-level analog feature<br>extraction, no mid/high-level<br>sensemaking, no reprogrammability                                                                                                                               |
|                                          | Columbia University<br>(USA)                           |                          | [G15],<br>[NSF15]                                                                                             | very large (>10 cm), no intelligence,<br>ignores the fundamental problem of very<br>large wireless power                                                                                                                                                                      |
| accelerators for<br>scene<br>sensemaking | KAIST (Korea)                                          | HJ. Yoo, Lee-<br>Sup Kim | [SLL17],<br>[PCL16],<br>[SPK16],<br>[PBS15],<br>[HBS15],<br>[O13], [P13],<br>[WSK08],<br>[KLK08],<br>[LKK11], | focused on sensemaking only (no<br>imager/cameras), high-accuracy at power<br>consumption 100X larger than allowed in<br>targeted applications                                                                                                                                |

|                                | -                          |                                                                                                                     |                                                                                                                                                                                            |
|--------------------------------|----------------------------|---------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|                                |                            | [LOK10],<br>[HPP15],<br>[PBS15],<br>[KKL14],<br>[HBS15],<br>[PCL16],<br>[LKK08],<br>[OLK09],<br>[OPK11],<br>[OKP12] |                                                                                                                                                                                            |
| MIT                            | V. Sze                     | [CKR16]                                                                                                             | focused on sensemaking only (no<br>imager/cameras), power consumption<br>10X larger than allowed in targeted<br>applications                                                               |
| Toshiba (Japan)                | H. Hayashi, T.<br>Miyamori | [SPK16]                                                                                                             | focused on sensemaking only (no<br>imager/cameras), power consumption<br>100X larger than allowed in targeted<br>applications                                                              |
| NTU (Taiwan)                   | LG. Chen                   | [CHW15]                                                                                                             | focused on sensemaking only (no<br>imager/cameras), power consumption<br>100X larger than allowed in targeted<br>applications                                                              |
| ETHZ (Switzerland)             | L. Benini                  | [RRL16],<br>[LLR16],<br>[PCR17]                                                                                     | efficient architectures for low-power<br>triggering, processing/sensemaking not<br>as efficient as best-in-class accelerators<br>for deep learning; hierarchical processing<br>is explored |
| KULeuven<br>(Belgium)          | M. Verhelst                | [MV16],<br>[MV17]                                                                                                   | general-purpose energy-efficient<br>accelerators for deep learning with<br>scalable precision (but no automatic<br>quality control)                                                        |
| STMicroelectronics<br>(France) | N/A                        | [DCB17]                                                                                                             | general-purpose energy-efficient<br>accelerators for deep learning with<br>scalable precision (but no automatic<br>quality control)                                                        |
| Stanford University            | M. Horowitz, W.<br>Dally   | [HLM16]                                                                                                             | general-purpose energy-efficient<br>accelerators for deep learning for high<br>performance, and energy efficiency not<br>on par with best in class                                         |
| Hokkaido University            | M. Motomura                | [UAH18]                                                                                                             | TSV-less 3D stacked deep learning<br>acceleration for high-speed, highly-<br>parallel systems                                                                                              |
| INRIA                          | O. Temam                   | [DFC15],<br>[CLL14],<br>[CDS14],<br>[LCL15]                                                                         | focused on sensemaking only (no<br>imager/cameras), power consumption<br>>100X larger than allowed in targeted<br>applications                                                             |
|                                |                            | -                                                                                                                   |                                                                                                                                                                                            |

| (u                                                                   | TSMC (Taiwan)                                           | C. Chao, FL.<br>Hsueh                                        | [LMC16]                                          | focused on imager for mobile<br>applications (large power, ignores the<br>problem of very large wireless power)                                 |
|----------------------------------------------------------------------|---------------------------------------------------------|--------------------------------------------------------------|--------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------|
| forms<br>tion domai                                                  | NHK Science &<br>Technology<br>Research Labs<br>(Japan) | T. Hayashida, H.<br>Shimamoto                                | [F15]                                            | focused on imager for mobile<br>applications (large power, ignores the<br>problem of very large wireless power)                                 |
| imagers for mobile platforms<br>d, but different application domain) | SONY (Japan)                                            | Y. Inada, H.<br>Wakabayashi, T.<br>Hirayama, N.<br>Fukushima | [S15], [S13],<br>[KNH18],<br>[HNH17],<br>[NSM18] | focused on imager for mobile<br>applications (large power, ignores the<br>problem of very large wireless power),<br>and recently on 3D stacking |
| gers for r<br>ut differe                                             | Toshiba (Japan)                                         | R. Okamoto, S.<br>Kousai                                     | [D13]                                            | focused on imager for mobile<br>applications (large power, ignores the<br>problem of very large wireless power)                                 |
| imaç<br>(related, bu                                                 | Shizuoka University<br>(Japan) S. Kawahito              |                                                              | [S12]                                            | resolution-scalable, but focused on<br>imager for mobile applications (large<br>power, ignores the problem of very large<br>wireless power)     |
|                                                                      | Samsung<br>Electronics (Korea)                          | CY. Choi, GS.<br>Han                                         | [K12]                                            | focused on imager for mobile<br>applications (very large power)                                                                                 |

| to                                               | GeorgiaTech                          | J. Romberg,<br><b>A.</b><br><b>Raychowdhury,</b><br>S. Mukhopadhyay | [XCR16],<br>[AXC16],<br>[DSR15] | photovoltaic cell-powered always-on<br>camera with gesture recognition<br>capability, 4-5 cm wide + 7 cm-wide solar<br>cell, 100s mW power                                                                           |
|--------------------------------------------------|--------------------------------------|---------------------------------------------------------------------|---------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| je sense                                         | Université Blaise<br>Pascal (France) | C. Bourrasset                                                       | [BMS13]                         | wired camera (no wireless<br>communication, not energy<br>autonomous), 6-7 cm wide                                                                                                                                   |
| om imaç<br>ing                                   | Carnegie Mellon<br>University        | N/A                                                                 | [CMU14]                         | wired camera (no wireless<br>communication, not energy<br>autonomous), 5.5 cm wide                                                                                                                                   |
| Complete vision system from image<br>sensemaking | ETHZ (Switzerland)                   | L. Benini                                                           | [KML07],<br>[MTB13],<br>[RRF17] | focused on low-power trigger and hence<br>on the low end of vision sensors;<br>processing/sensemaking not as efficient<br>as best-in-class accelerators for deep<br>learning; hierarchical processing is<br>explored |
| omplete v                                        | KAIST (Korea)                        | H. J. Yoo                                                           | [BCK17],<br>[MTB13],<br>[BCK17] | low-end and application specific systems<br>with limited computation-ability and no<br>reprogrammability (e.g., fixed face<br>recognition)                                                                           |
| 0                                                | Sony (Japan)                         | N/A                                                                 | [YKU17]                         | 3D stacked image sensor and processor<br>for high-performance/high-speed<br>imagine (unsuited for distributed vision)                                                                                                |

|                                    | University of<br>Manchester (UK)  | P. Dudek | [CBD13],<br>[LD13],<br>[CBD11] | low-end and application specific systems<br>with limited computation-ability and no<br>reprogrammability (e.g., loiterer<br>detection), or high-speed high-power<br>smart imagers (unsuited for distributed<br>vision)     |  |  |  |
|------------------------------------|-----------------------------------|----------|--------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|
|                                    | Fraunhofer Institute<br>(Germany) | N/A      | [F11]                          | untethered camera with compression, 8<br>cm long, very large power (4 W)                                                                                                                                                   |  |  |  |
| 0                                  | Blink                             | N/A      | [BLK16]                        | no intelligence (only sends 10-s clips<br>when motion is detected), low-power (3<br>mW) only in unrealistically fixed scene,<br>very large power (25 mW) in public<br>spaces and other realistic conditions, 7-<br>cm wide |  |  |  |
| and large enterprises in 2015-2016 | HomeBoy                           | N/A      | [HMB16]                        | no intelligence (only sends 30-s clips<br>when motion is detected), 2-month<br>operation in unrealistically fixed scene<br>(much shorter in realistic conditions), 7-<br>cm wide                                           |  |  |  |
| nterprises                         | Butterfleye                       | N/A      | [BFL16]                        | motion detector, limited intelligence to<br>discard false events, sends (or records)<br>up to 30ss clips when triggered, 2-week<br>operation, 9-cm wide                                                                    |  |  |  |
| nd large ei                        | Google CLIPS                      | N/A      | [CLP17]                        | limited intelligence to trigger video<br>shooting upon event occurrence, but no<br>interaction with cloud, no control on the<br>type of events                                                                             |  |  |  |
| startups an                        | Knit Health                       | N/A      | [KNT17]                        | limited intelligence to trigger video<br>shooting upon motion detection, limited<br>to recording (no interaction with cloud,<br>no control on the type of events)                                                          |  |  |  |
|                                    | Arlo (Netgear)                    | N/A      | [ARL15]                        | motion detector, sends (or records) up to<br>30s clips when triggered (no<br>intelligence), 3-6 month operation in<br>unrealistically fixed scene (much shorter<br>in realistic conditions), 7-cm wide                     |  |  |  |

### Table III. Recent and on-going worldwide research programs on areas related to CogniVision.

|                                                                                                       |                                                                                                                                                                               |                                                                                                                                           | r                                                                                                                                                                |                                                                                                                                                                                                       |
|-------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| research                                                                                              | funding                                                                                                                                                                       | year of completion                                                                                                                        | scope                                                                                                                                                            | limitations, differences                                                                                                                                                                              |
| program title                                                                                         | agency                                                                                                                                                                        | completion                                                                                                                                |                                                                                                                                                                  |                                                                                                                                                                                                       |
| Reconfigurable<br>Imaging<br>(ReImagine)<br>[REC16]                                                   | DARPA<br>(USA)                                                                                                                                                                | focused on imager only, no<br>sensemaking and vision, high-<br>performance imagers (large<br>power, not suited for untethered<br>cameras) |                                                                                                                                                                  |                                                                                                                                                                                                       |
| Hercules (High-<br>Performance<br>Real-time<br>Architectures<br>for Low-Power<br>Embedded<br>Systems) | H2020 (EU)                                                                                                                                                                    | 2020                                                                                                                                      | different regions of interest<br>integrated framework for cutting-<br>edge heterogeneous multi-core<br>platforms for real-time<br>computation                    | one of the two targeted<br>applications is a visual recognition<br>system for the avionic domain,<br>focused on very high speed<br>computation, not on complete and<br>ultra-low power vision systems |
| MicroLearn:<br>Micropower<br>Deep Learning                                                            | Swiss<br>National<br>Foundation                                                                                                                                               | N/A<br>(~2020)                                                                                                                            | ultra-low power accelerators for deep learning                                                                                                                   | focused on deep learning<br>accelerators, no integration of<br>complete vision systems                                                                                                                |
| Smart Cyber-<br>Physical<br>Systems                                                                   | H2020 (EU)                                                                                                                                                                    | 2020                                                                                                                                      | focused on low-end sensing<br>platforms, no focus on general-<br>purpose ubiquitous deep learning<br>accelerators                                                |                                                                                                                                                                                                       |
| Visual Cortex<br>on Silicon<br>[NSF13]                                                                | National<br>Science<br>Foundation<br>(USA)                                                                                                                                    | 2018                                                                                                                                      | understanding the fundamental<br>comprehension mechanisms used<br>in the visual cortex                                                                           | no camera demonstration, no chip<br>demonstration, only sensemaking<br>based on simulations and off-the-<br>shelf components                                                                          |
| Systems of<br>Neuromorphic<br>Adaptive Plastic<br>Scalable<br>Electronics<br>(SyNAPSE)<br>[SYN09]     | romorphic (USA) power-saving efficiency, with 100x<br>ptive Plastic<br>able processing than state-of-the-art<br>tronics NAPSE) chips (spurred TrueNorth<br>neuromorphic chip) |                                                                                                                                           | focused on large-scale compute-<br>intensive sensemaking<br>(cloud/datacentre level), no<br>camera                                                               |                                                                                                                                                                                                       |
| COgnitive &<br>Perceptive<br>CAMeraS<br>[COP13]                                                       | European<br>Union                                                                                                                                                             | 2016                                                                                                                                      | ultra-low power computer<br>architectures for cameras, based<br>on many-core/GPU platform,<br>focused on application-network-<br>software-architecture interface | no imagers/cameras, no<br>specialized hardware, very large<br>power Watt range [COP13b], no<br>chip demonstration (only FPGA<br>prototyping)                                                          |
| Vision-in-<br>Package<br>[CSE15]                                                                      | Swiss<br>National<br>Science<br>Foundation<br>(Switzerlan<br>d)                                                                                                               | ional with imager and ARM Cortex M4,<br>ence perform face detection, facial<br>Indation landmark tracking, person                         |                                                                                                                                                                  | wired camera (not energy<br>autonomous, not ubiquitous),<br>1.85-cm wide, assembled on<br>printed circuit board, 3–4 fps                                                                              |
| IcyCAM<br>[CSE15b]                                                                                    | CSEM<br>(Switzerlan<br>d)                                                                                                                                                     | 2015                                                                                                                                      | single-chip miniaturized camera                                                                                                                                  | wired camera (not energy<br>autonomous, not ubiquitous),<br>imager and with general-purpose<br>processor integrated on same<br>silicon chip, but much larger<br>power (80 mW at ¼ of VGA)             |

| Supervised | Office of | 2015 | visual processing algorithms and     | focused on algorithms, no camera  |
|------------|-----------|------|--------------------------------------|-----------------------------------|
| Autonomous | Naval     |      | hardware/software platform for       | demonstration, no chip            |
| Fires      | Research  |      | remote weapon stations               | demonstration, only sensemaking   |
| Technology | (USA)     |      | (targeting, tracking and fire        | based on simulations and off-the- |
| (SAF-T)    |           |      | control)                             | shelf components                  |
| [SAF13]    |           |      |                                      |                                   |
| NeoVision2 | DARPA     | 2012 | focused on neuroscience-inspired     | focused on algorithms inspired by |
| [NEO09]    | (USA)     |      | visual algorithms for detection,     | the design principles employed by |
|            |           |      | recognition, and tracking of many    | mammalian vision systems, no      |
|            |           |      | different classes of objects in live | camera demonstration, no          |
|            |           |      | video imagery                        | chip/hardware demonstration       |



Fig. D11. The three dimensions of innovation in CogniVision.





Fig. D12. a) General Dyadic Digital Pulse Modulation (DDPM) operation [C17]. Interestingly, the DDPM modulation can be effectively used to perform products, weighted sums (and hence convolutions for deep learning) with very low hardware cost, which consists of simple pulse counters (see on the right side of the figure). b) CogniVision leverages this fundamental and new observation to simplify each neuron into a counter, replacing the conventional energy-hungry method to compute convolution through multiply and accumulate.

c) Example of numerical simulation showing that the computational complexity and the relative accuracy are independent of the number of weighted products (i.e., complexity of the network), thanks to the DDPM approach. In this example, the results of 128 sets of N=256-terms weighted sums and N=1024-terms weighted sums are computed according to the proposed DDPM technique, and are compared with the results of the conventional computation, showing an error which is almost always less than 2% in both cases, independently of the number of weights. The targeted error can be easily reduced (1 additional bit of accuracy for doubled W) by increasing W, at the expectable cost of increased computation time.

d) Resulting DDPM architecture of deep learning accelerators (see preliminary results in relevant section with 50TOPS/W expected energy efficiency in 28nm CMOS technology).





(b)

Fig. D13. a) Example with CIFAR10-trained neural network based on conventional uniform (U) and proposed nonuniform (NU) precision across neurons in convolutional layer #3 (results in other layers are equivalent or better). For a given accuracy over CIFAR-10 benchmark, non-uniform precision allows 5-10X reduction in complexity (i.e., overall number of computed bits, and hence gate count) compared to conventional uniform.

b) The penalty of non-uniform precision training is a 10X increase in the offline training time. This increase in offline training time is amortize across all devices performing inference. The increased offline training time can be dealt with by using commercially available cloud services (e.g., Amazon), which permit to temporarily scale up the server speed for training at larger cost. In other words, non-uniform precision allows a tradeoff between cost at training time (usually very small, in view of the large number of devices sharing the same network) and the complexity and power at inference time.



#### PROPOSED 2-PHASE NETWORK POWER-AWARE COMPRESSION APPROACH

#### Phase I: hard thresholding over connections and sub-network fine-tuning.

Apply hard thresholding over gradients magnitude calculated at each neuron to select the most informative ones (with large gradient magnitude). The hard thresholding preserves the top k neurons with the largest magnitude and disables the others by zeroing their parameters. Then, fine-tune the alive neurons to compensate the performance loss caused by the reduction in the number of filters. The loss function is calculated in a way specific to the application, and also combines both the accuracy and the power to achieve a desired balance between energy and quality (power-aware).

#### Phase II: neuron re-activation

The disabled neurons are re-activated and all the parameters are learned by training the entire network. The goal of this phase is to restore the truncated neurons and re-train the network to escape from some incorrectly compressed network models.

The above two phases are performed iteratively until there is no change over the neuron selection.

The final operation is the one in phase I to produce a compressed network. The proposal of such a gradient-based compression approach is based on the general intuition that the gradient magnitude passing through each neuron could reflect the "informativeness" of each neuron during the optimization process [Z16].

(b)

Fig. D14. a) Model pruning to remove redundant parameters and reduce the size of a deep learning model. In this example, both the connections between different layers of the model and redundant parameters (the neurons) are pruned based on the iterative hard thresholding method. As a result, more than 50% of the parameters (shown as the connections) are pruned.

b) Details on the proposed two-phase power-aware compression approach.





Fig. D15. Use of small deep neural networks to automatically identify salient regions. In this example, the machine learning circuit automatically focuses its attention on the person in the image (highlighted in red), discarding other irrelevant regions to avoid unnecessary computation (again, a form of irrelevant computation skipping).



## novel SRAM bitcell with non-precharged bitline

Fig. D16. Novel on-chip SRAM memory bitcell with unconventional non-precharged bitline for 70-80% reduced bitline activity (and 40% reduced power) to store features, pixels and weights. As opposed to existing 6T and 8T bitcells, the proposed bitcell is able to drive the read bitline to ground and to the supply voltage, thus avoiding the need for precharge and the resulting high bitline activity encountered in conventional pre-charged SRAMs.



Fig. D17. In CogniVision, irrelevant activity is stopped at the lowest possible level of semantic understanding. The sooner it is stopped, the lower its power cost as higher levels of semantic understanding are associated with larger power. Every task has low activation rate (i.e., it is executed on a small fraction of the frame), reducing effective power by the same factor. The numerical example on the right refers to human detection in an indoor environment (maximum up to 20 humans in the field of view, 500-1,000lux light level), and uses preliminary deep learning logic-level simulations and detailed power calculations/estimates reported in Table IV.



Fig. D18. In CogniVision, each sub-system in Fig. D14 (e.g., imager, feature extractor...) generates a small relevance table (e.g., few kb at most), where the frame portions/tiles where relevant activity is taking place. The output of the relevance table is taken up by the next sub-system (e.g., feature extractor after imager) to skip computation that pertains to irrelevant regions (i.e., where the bits in the relevance table are tagged as irrelevant, which are left blank in this figure).

This mechanism involves all sub-systems to avoid the waste of power observed in conventional vision systems on a chip that re-compute the entire frame every time a single event occurs (e.g., appreciable motion in a pixel).

#### PROPOSED CIRCUIT TO EXECUTE THE FREQUENCY-TUNED SALIENCY ALGORITHM WITHIN THE SENSOR ITSELF [AHE09]



if tile current differs from longterm average by more than threshold  $\epsilon$ :

- tile is salient due to significant intensity change (flag in relevance table)

 all its individual pixels are readout, being salient.

if tile current is close to long-term average (within threshold ε):

 tile is NOT salient due to negligible intensity change (unflag in relevance table)

- do not read-out individual pixels to save read-out power by 25X



(b)

Fig. D19. a) Circuit principle of the proposed in-sensor saliency detector: if the overall 5x5 pixel tile current changes significantly, it means that the intensity in the tile has changed appreciably, hence the tile is salient. In this case, the imager relevance table in Fig. D18 is updated, flagging the corresponding tile as relevant (i.e., salient). All individual pixels in the tile are read-out normally.

If the overall 5x5 pixel tile current is similar to its long-term average, no appreciable change is detected and the tile is non-salient. In this case, pixels do not need to be read out, thus reducing number of read-outs and imager power by 25X.

b) Numerical analysis of in-sensor saliency detector through benchmark in []. The precision vs recall plot for various values of the threshold  $\varepsilon$  in Fig. D19a shows that lower thresholds improve Recall (higher), at the cost of worse precision (lower). To avoid skipping potentially salient regions, Recall is more important than Precision and hence needs to be favored.

The point highlight in red ( $\epsilon$ =0.02) is an example of reasonable tradeoff, where Recall is quite high (92%), and Precision is fairly low (33%), but still reasonable in terms of impact on power. Indeed, the resulting increase in false positives (i.e., activity of feature extractor) has minor impact on the overall power saving, since only 2-3% of tiles turn out to be salient anyway (i.e., activity and power are drastically reduced in spite of the presence of false positive



Fig. D20. Architecture of always-on receiver at the ISM band of 2.4GHz.

The receiver power in the always-on part is estimated to be  $300-350\mu$ W from preliminary simulations in 180nm CMOS. The receiver and the transmitter are expected to consume 2mW when ON, but their infrequent activation reduces their average power by two orders of magnitude (i.e., few uWs), under realistic activation rates in the order of 0.01% (i.e., communication between cloud and camera occurs every 10,000 frames, or equivalently every 33 seconds - or longer - at 30frames/s).



Fig. D21. In-principle architecture of CogniVision. The System on Chip communicates with the external world through a radio transceiver, which is connected to the low-performance microprocessor managing the chip settings via a) a programming interface that provides the settings (including the weights for deep learning) as per the cloud's requests, b) an output interface for wireless transmission (e.g., ZigBee).

# Fig. D22. Gantt chart: project launch, integration, exploration & demonstration, energy-centric techniques

| (Mx.y = milestone y in sub-project x; Dx.y = deliverable y in sub-project x) |
|------------------------------------------------------------------------------|
|------------------------------------------------------------------------------|

|                                |                                                                                                                               | , ,    | Ye          | ar         | 1    | • | Ye | ar | 2    | ,      | Year 3 |    |      | Year 4 |    |    | 4    | `  | Yea  | ar ( | 5    |
|--------------------------------|-------------------------------------------------------------------------------------------------------------------------------|--------|-------------|------------|------|---|----|----|------|--------|--------|----|------|--------|----|----|------|----|------|------|------|
|                                |                                                                                                                               | Q<br>1 | 02          | <b>Q</b> 3 | Q4   | 0 | Q2 | Q3 | 04   | Q<br>1 | 02     | Q3 | Q4   | 01     | 02 | Q3 | Q4   | Q1 | 02   | Q3   | Q4   |
| Project<br>launch -<br>phase 0 | Hiring, procurement and collaborative SW environment setup                                                                    |        | M0.1 (D0.1) |            |      |   |    |    |      |        |        |    |      |        |    |    |      |    |      |      |      |
|                                | <ul> <li>0.1 Recruitment of majority of the manpower</li> <li>0.2 Requisition of major equipment essential for the</li> </ul> |        |             |            |      |   |    |    |      |        |        |    |      |        |    |    |      |    |      |      |      |
|                                | Programme<br>0.3 Setup of collaborative SW<br>environment                                                                     |        |             |            |      |   |    |    |      |        |        |    |      |        |    |    |      |    |      |      |      |
| Sub-<br>project 1              | System modeling,<br>exploration, integration,<br>demonstration                                                                |        |             |            |      |   |    |    |      |        |        |    |      |        |    |    |      |    |      |      |      |
|                                | 1.1 System simulation/modeling framework                                                                                      |        |             |            |      |   |    |    | D1.1 |        |        |    |      |        |    |    |      |    |      |      |      |
|                                | 1.2 System on board (SoB)<br>integration/characterization<br>(round #1)                                                       |        |             |            |      |   |    |    |      |        |        |    | M1.2 |        |    |    |      |    |      |      |      |
|                                | 1.3 System on board (SoB)<br>integration/characterization<br>(round #2)                                                       |        |             |            |      |   |    |    |      |        |        |    |      |        |    |    | M1.3 |    |      |      |      |
|                                | 1.4 System on chip (SoC)<br>partitioning, chip level<br>simulation environment                                                |        |             |            |      |   |    |    |      |        |        |    |      |        |    |    | M1.4 |    |      |      |      |
|                                | 1.5 System on chip (SoC) optimization and integration                                                                         |        |             |            |      |   |    |    |      |        |        |    |      |        |    |    |      |    | D1.5 |      |      |
|                                | 1.6 SoC characterization and<br>in-field validation                                                                           |        |             |            |      |   |    |    |      |        |        |    |      |        |    |    |      |    |      |      | M1.6 |
| Sub-<br>project 2              | Energy-centric circuit<br>techniques and interaction at<br>imager-sensemaking and<br>wireless-sensemaking<br>boundary         |        |             |            |      |   |    |    |      |        |        |    |      |        |    |    |      |    |      |      |      |
|                                | 2.1 Imager and transceiver architectural exploration                                                                          |        |             |            | D2.1 |   |    |    |      |        |        |    |      |        |    |    |      |    |      |      |      |
|                                | 2.2 Imager and transceiver tapeout and testing (round #1)                                                                     |        |             |            |      |   |    |    | D2.2 |        |        |    | M2.2 |        |    |    |      |    |      |      |      |
|                                | 2.3 Imager and transceiver tapeout and testing (round #2)                                                                     |        |             |            |      |   |    |    |      |        |        |    | D2.3 |        |    |    | M2.3 |    |      |      |      |
|                                | 2.4 Imager and transceiver final revision for SoC, silicon demonstration                                                      |        |             |            |      |   |    |    |      |        |        |    |      |        |    |    |      |    | M2.4 |      |      |

|                                | 2.5 Final characterization and validation                                                               |       |       |      |      | M2.5 |
|--------------------------------|---------------------------------------------------------------------------------------------------------|-------|-------|------|------|------|
| Sub-<br>project 3              | Energy-centric machine<br>learning-circuit co-design                                                    |       |       |      |      |      |
|                                | 3.1 Deep learning model<br>compression                                                                  | M3.1a | M3.1b | D3.1 |      |      |
|                                | 3.2 Energy-aware deep<br>learning network design and<br>training                                        |       |       | D3.2 |      |      |
|                                | 3.3 Saliency model                                                                                      |       |       | M3.3 |      |      |
|                                | 3.4 In-field model fine-tuning, validation and integration                                              |       |       |      |      | M3.4 |
| Sub-<br>project 4              | Irrelevant activity<br>skipping/EQ-scalable<br>sensemaking<br>circuits/architectures                    |       |       |      |      |      |
|                                | 4.1 Activity skipping architectures/circuits                                                            |       |       | M4.1 |      |      |
|                                | 4.2 EQ-scalable architectures/circuits                                                                  |       |       | M4.2 |      |      |
|                                | 4.3 Feature extraction, novelty<br>assessment, deep learning,<br>SRAM tapeout and testing<br>(round #1) |       | D4.3  | M4.3 |      |      |
|                                | 4.4 Feature extraction, novelty<br>assessment, deep learning,<br>SRAM tapeout and testing<br>(round #2) |       |       | D4.4 | M4.4 |      |
|                                | 4.5 Final characterization and validation                                                               |       |       |      |      | M4.4 |
| Project<br>control –<br>task 5 | Project control and reviews                                                                             |       |       |      |      |      |
|                                | 5.1 Internal review meetings with Advisory Board                                                        | M5.1  | M5.2  | M5.3 | M5.4 | M5.5 |
|                                | 5.2 Mid-term review                                                                                     |       |       | M5.6 |      |      |
|                                | 5.3 Final review                                                                                        |       |       |      |      | M5.7 |

Table IV. Detailed targets for the final demonstration and measure of the success of the project in three visual tasks (ImageNet classification, human detection and object detection). Detailed operating conditions, dataset, neural network targets and chip performance targets are provided for each of them.

| task                                                         | 1) ImageNet image                                                                    | 2) human detection*                                                                                                        | 3) object detection*                                                                                                       |
|--------------------------------------------------------------|--------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------|
|                                                              | classification<br>(0.5MobileNet network<br>[MBN17])                                  | (detect and localize the<br>presence of persons within<br>a frame)                                                         | (detect and localize<br>objects of a specific<br>category in a frame)                                                      |
| testing dataset                                              | public Imagenet database<br>[ILSVRC]                                                 | live scenes captured in EA<br>lobby @ NUS (additional<br>scenes from public space<br>in Singapore, subject to<br>approval) | live scenes captured in EA<br>lobby @ NUS (additional<br>scenes from public space<br>in Singapore, subject to<br>approval) |
| operating<br>condition**                                     | 500-1,000lux light level,<br>wall-projected ImageNet<br>samples                      | 500-1,000lux light level,<br>up to 20 humans in the<br>field of view                                                       | 500-1,000lux light level,<br>up to 10 objects in the field<br>of view                                                      |
| adopted network                                              | standard MobileNet<br>[MBN17]                                                        | structure similar to<br>AlexNet (5CONV+3FC),<br>retrained for human<br>detection via innovative<br>compression techniques  | structure similar to<br>AlexNet (5CONV+3FC),<br>retrained for object<br>detection via innovative<br>compression techniques |
| accuracy target                                              | 60%<br>(reference: 57.2% in<br>AlexNet)                                              | detect 85% of 300 persons<br>in a single frame                                                                             | 80% over 10 categories<br>(person, car, chair, dog,<br>bicycle, bird, bus,<br>table,motorbike,monitor)                     |
| throughput target                                            | 30fps, projected images<br>with 256x256 resolution<br>(ImageNet benchmark)           | 30fps<br>VGA resolution                                                                                                    | 30fps<br>VGA resolution                                                                                                    |
| model size (weight<br>memory)***                             | 1.3E6<br>(1.3 MB after innovative<br>network compression and<br>weight binarization) | 6E6<br>(0.75 MB after innovative<br>network compression and<br>weight binarization)                                        | 6E6<br>(0.75 MB after innovative<br>network compression and<br>weight binarization)                                        |
| #<br>operations/frame**<br>**                                | 76E6                                                                                 | 114E6                                                                                                                      | 114E6                                                                                                                      |
| targeted<br>throughput @<br>30fps (ops/frame *<br>framerate) | 2,280MOPS                                                                            | 3,420MOPS                                                                                                                  | 3,420MOPS                                                                                                                  |
| targeted<br>CogniVision<br>power <sup>*****</sup>            | 1mW (dominant<br>contribution: 0.56mW<br>deep learning accelerator)                  | 1.2mW (dominant<br>contribution: 0.8mW deep<br>learning accelerator)                                                       | 1.2mW (dominant<br>contribution: 0.8mW deep<br>learning accelerator)                                                       |

\* Detection is here performed on a frame basis (no tracking). Occlusion is not dealt with in these demonstrations, as no elegant solution has been found in the preliminary exploration we have performed in this area (due to the complexity of the task). If strictly needed, occlusion can be addressed in the cloud by occasionally having the cognitive camera send all the keypoints for frames where there is activity, and have the cloud deal with occlusion. Another possible approach is to generate a deep network that is able to perform this task within the capabilities of the CogniVision system on chip (i.e., MB-range weight memory, 20,000MOPS computational throughput).

\*\* Range of conditions that have been used in deep learning simulations to estimate the achievable accuracy in preliminary exploration (Caffe framework [BKL]), same as target conditions at CogniVision deployment

\*\*\* Weight memory evaluated after training and compressing the AlexNet network for the accuracy target in the table (see proposal for the details of the techniques introduced to reduce the model size)

\*\*\*\* Number of operations (additions, multiplications, comparisons) per frame evaluated from the actual structure of the compressed network in the table, then scaled to VGA by realistically assuming a complexity increase (i.e., neurons, number of computations) by 12X compared to AlexNet at its 256x256 resolution (12X was evaluated by retraining the network with the same structure for VGA resolution).

\*\*\*\*\* Power of deep learning accelerator is obtained as TOPS/(TOPS/W) where TOPS=1,000,000 MOPS is indicated in the table, and the energy efficiency TOPS/W = 50 from logic simulations of the DDPM accelerator in 28nm CMOS. Dominant wireless power is dictated by the receiver, and is  $350\mu$ W from the preliminary results discussed in the text.

Estimates in this table are generated under the following **assumptions**:

- the popular FOM of the imager is 10pJ/pixel (in line with reasonably good imagers with similar pixel size of 5μm and technology, although not best-in-class as this FOM is not critical to the overall power as shown in the example in Fig. D17)

- the energy/pixel of the feature extractor is estimated to be 22pJ/pixel in in 28nm CMOS (i.e., only 2X lower than recent silicon demonstration from our team [APA17], which is pessimistic compared to the preliminary simulation results obtained with the new feature extractor architecture that will be explored in the project). Such pessimistic assumption will not impact the overall power estimate significantly, as the dominant contributions come from the deep learning accelerator and the radio transceiver

- the energy/frame in novelty assessment is equal to the energy in the feature extractor (estimated to be comparable from high-level simulations)

- the deep learning accelerator has an energy efficiency of 50TOPS/W, as found from post-synthesis logic simulations of a preliminary Verilog description of a small-scale DDPM accelerator (16x8 neurons) in 28nm CMOS

- memory energy per access is 30fJ/bit, in line with circuit simulations of an SRAM in 28nm CMOS

- transmitted wireless power is assumed to be 2mW (reduced to  $2\mu$ W by the realistic activation rate of 0.01%, which corresponds to one transmission every 10,000 frames, or equivalently 33s)

- pixel activation probability in pre-saliency assessment is 5% (pessimistic, as it can be as low as 3.5% depending on the specific video, using the benchmark in [CDT12])

- novelty assessment identifies 20% features as novel on average (pessimistic, this has been observed to be down to 5% through the benchmark in [CDT12])

- no energy saving from irrelevant activity skipping is pessimistically being considered in the deep learning accelerator, as this seems to be dependent on the network from a preliminary analysis. Deeper analysis will be carried out during the execution of the project.

The above assumptions immediately lead to the numerical results in Fig. D17, by simply multiplying each power contribution by the corresponding activation rate (see above assumptions).



Fig. D23. The alignment of the "CogniVision" CRP program and the Smart Nation vision (cognitive cameras can be used to address several challenges and accelerate the fulfilment of the Smart Nation end goals).

Table V. Industrial collaborations of team members and adoption of their research work in areas that are relevant to the CogniVision project

| team member            | companies (Singapore)                                                                                                    | research topic                                                                                                                                                        | notes                                                                                                                                                                                                                                                                                                                                                                                                  |
|------------------------|--------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|                        | Intel                                                                                                                    | ultra-low power digital                                                                                                                                               | research collaboration                                                                                                                                                                                                                                                                                                                                                                                 |
|                        |                                                                                                                          | signal processing                                                                                                                                                     |                                                                                                                                                                                                                                                                                                                                                                                                        |
|                        | Mediatek                                                                                                                 | energy-quality scalable<br>circuits                                                                                                                                   | Intellectual Property sharing, full<br>fabrication support                                                                                                                                                                                                                                                                                                                                             |
| Prof. Massimo ALIOTO   |                                                                                                                          | ultra-low power circuits                                                                                                                                              | Intellectual Property sharing, full                                                                                                                                                                                                                                                                                                                                                                    |
|                        | TSMC (Taiwan)                                                                                                            | for IoT                                                                                                                                                               | fabrication support                                                                                                                                                                                                                                                                                                                                                                                    |
|                        | Huawei, NeuroMem<br>Technologies and several<br>others                                                                   | ultra-low power<br>frontends for vision                                                                                                                               | possible licensing of previously developed vision technologies (under discussion)                                                                                                                                                                                                                                                                                                                      |
| Prof. FENG Jiashi      | Huawei, Qihoo 360,<br>Adobe, Snap on                                                                                     | deep learning and<br>computer vision (vehicle<br>detection, scene parsing,<br>human pose estimation,<br>)                                                             | research collaboration                                                                                                                                                                                                                                                                                                                                                                                 |
|                        | Panasonic R&D                                                                                                            | face<br>verification/detection                                                                                                                                        | adopted in Panasonic Face Pro system<br>(most accurate face recognition in NIST<br>IJB-A benchmark) and will be used in the<br>surveillance system managed by the<br>Singapore Ministry of Home Affairs                                                                                                                                                                                                |
| Prof. YEO Kiat Seng    | GlobalFoundries,<br>Samsung                                                                                              | RF device<br>characterization and<br>modeling                                                                                                                         | inductor design for RF, transformers,<br>varactors, VCOs, RF transistors                                                                                                                                                                                                                                                                                                                               |
|                        | MediaTek, Panasonic,<br>LTA, A*STAR,<br>Broadcom, Infineon                                                               | RF transceiver<br>architectures and power<br>amplifiers                                                                                                               | <ul> <li>has demonstrated the world's smallest<br/>on-chip low-pass filter (US Patent) with<br/>the broadest stop-band up to 52 times<br/>the cut-off frequency, i.e., 110GHz</li> <li>36G/24G front-end transceiver<br/>architectures with carrier suppression and<br/>ultra-low unwanted emissions, power<br/>amplifier and linearization techniques<br/>using active and passive devices</li> </ul> |
| Prof. Luca Benini      | Greenwaves<br>Technologies                                                                                               | parallel-ultra-low power<br>digital processor for<br>computer vision, deep-<br>learning accelerator                                                                   | commercially licensed                                                                                                                                                                                                                                                                                                                                                                                  |
|                        | Google, Micron,<br>STMicroelectronics,<br>Mentor Graphics,<br>Cadence                                                    | PULP open source<br>platform for near-sensor<br>analytics                                                                                                             | publicly acknowledged adoption                                                                                                                                                                                                                                                                                                                                                                         |
| Prof. CHEN Shoushun    | Samsung                                                                                                                  | High Dynamic Range<br>CMOS Image Sensor<br>System with Adaptive<br>Integration Time and<br>Multiple Readout<br>Channels" (US Patent)                                  | commercialization in progress: signed<br>NDA and disclosed patent details                                                                                                                                                                                                                                                                                                                              |
|                        | HILLHOUSE<br>TECHNOLOGY PTE LTD,<br>(Singapore-based startup<br>company)                                                 | A High Speed Motion<br>Detection Image Sensor"<br>(US patent US 9,628,738<br>B2 granted in July 2017)                                                                 | twelve-year exclusive licensing                                                                                                                                                                                                                                                                                                                                                                        |
| Prof. Dennis SYLVESTER | founded two startups: 1)<br>Ambiq Micro in 2010<br>based in Austin, TX<br>2) CubeWorks in 2013<br>based in Ann Arbor, MI | <ol> <li>1) ultra-low power<br/>components for<br/>wearables and IoT</li> <li>2) Michigan Micro Mote<br/>(M3) platform (one M3<br/>design includes imaging</li> </ol> | 1) raised \$90M to date, lead is Kleiner<br>Perkins (VC that led Google funding)<br>2) Intel Capital is lead funder                                                                                                                                                                                                                                                                                    |

|  | based on infrequent |  |
|--|---------------------|--|
|  | triggering)         |  |