Adversarial Attacks Against Deep Learning-Based Speech Recognition Systems

2021 
Due to the development of deep learning technologies, automatic speech recognition (ASR) systems have become the new man–machine interaction mode. However, deep neural network (DNN) is vulnerable to adversarial examples (AEs), which are crafted with small perturbation on normal examples to cheat DNN, while cannot be detected by human beings. The practical adversarial attacks against ASR systems, especially the intelligent voice control (IVC) devices such as Google Home, Cortana, Echo, etc., whose models are not publicly available, are challenging, due to the distractions of the physical world and the black-box models. By exploiting the vulnerability of ASR algorithms and simulating the real-world attack scenarios, we propose CommanderSong attack that automatically integrates malicious commands into a song to generate AEs. When the AEs are played through the air and recorded, the malicious commands in the recorded audios can be recognized by the white-box “Aspire Chain Model,” while is unnoticeable to ordinary users. Furthermore, by alternately generating AEs with the common white-box “Aspire Chain Model” and a small substitute model of the target black-box model, our Devil’s Whisper can effectively attack the popular commercial IVC devices (Google Assistant, Google Home, Amazon Echo, and Microsoft Cortana) in the real world. For 98% of the target commands, our approach can generate at least one AE for attacking the target devices. The user study on Amazon Mechanical Turk shows that none of the participants can identify any command from our AEs if they listen to them once. We also demonstrate that such attack can be spread through Internet (e.g., YouTube) and radio signals, potentially affecting millions of ASR users.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []