I am a machine learning reseracher and engineer at Bose Research, where I work on research and development of NLP and audio applications. Previously I was a Collaborator and Researcher at the Music Technology Group (MTG), Universitat Pompeu Fabra, Barcelona, Spain. I received my Ph.D from Georgetown University[GUCL Group, CorpLing Lab] with a focus on Computational Linguistics. My research interest includes machine learning and deep learning for audio (speech, music, environmental audio understanding), natural language processing (NLP), speech prosody, and computational musicology.CV email music LinkedIn
J. Williams, T. Azim, A. -M. Piskopani, A. Chamberlain and S. Zhang, "Socio-Technical Trust For Multi-Modal Hearing Assistive Technology," 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW), Rhodes Island, Greece, 2023, pp. 1-5, doi: 10.1109/ICASSPW59220.2023.10193586. [IEEE Explore]
N Shashaank, Berker Banar, Mohammad Rasool Izadi, Jeremy Kemmerer, Shuo Zhang, Chuan-Che (Jeff)Huang . HiSSNet: Sound Event Detection and Speaker Identification via Hierarchical Prototypical Networks for Low-Resource Headphones. Proceedings of the ICASSP 2023, Rhodes Island, Greece. [arXiv][IEEE Explore]
Alnajjar,K, Hämäläinen,M, Zhang,S. Ring That Bell: A Corpus and Method for Multimodal Metaphor Detection in Videos. Proceedings of the Third Workshop on Figurative Language Processing (FigLang) at EMNLP 2022. [U of Helsinki] [Zenodo][ACL Anthology]
Zhang,S. Data mining Mandarin tone contour shapes. Proceedings of 16th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology (at ACL 2019). Association for Computational Linguistics: Florence, Italy, August 2019. [preprint][acl anthology]
Caro, R, Zhang, S, Serra, X. Quantitative analysis of the relationship between linguistic tones and melody in jingju using music scores. Proceedings of the 3rd International workshop on Digital Libraries for Musicology (DLfM) at International Society for Music Information Retrieval Conference (ISMIR) 2017, Shanghai, China, October 2017. Published by ACM-ICPS. [MTG][UPF e-repositori][ACM-Digital Library]
Zhang,S., Caro,R, Serra,X. Understanding the expressive functions of jingju metrical patterns through lyrics text mining. Proceedings of the 18th International Society for Music Information Retrieval (ISMIR) conference, Suzhou, China, October 2017. [MTG] [UPF e-repositori]
Zhang,S. RankLyrics: A ranking-based approach to automatic song lyrics generation. MASCSLL'17 . DC: George Washington University, May 2017. [pdf][code]
Zhang,S., Caro,R, Serra,X. Understanding the expressive functions of jingju music rhythmic types through lyrics text mining (poster abstracts). Proceedings of the 30th International Florida Artificial Intelligence Research Society Conference (FLAIRS 30, published by AAAI Press), May 22-24, 2017, Marco Island, Florida, FL.[pdf @ AAAI Press]
Zhang, S., Zeldes, A. GitDOX: A Linked Version Controlled Online XML Editor for Manuscript Transcription. Proceedings of the 30th International Florida Artificial Intelligence Research Society Conference (FLAIRS 30, published by AAAI Press), May 22-24, 2017, Marco Island, Florida, FL. [pdf @ AAAI Press]
Zhang, S. Mining linguistic tone patterns with symbolic representation. Proceedings of the 14th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology (SIGMORPHON at ACL 2016), Association for Computational Linguistics, Berlin, Germany, August 2016.[ACL anthology]
Zeldes,A, Zhang, S. When Schemas Change Rules Help : A Configurable Approach to Coreference beyond OntoNotes. In: Proceedings of the NAACL2016 Workshop on Coreference Resolution Beyond OntoNotes (CORBON). Association for Computational Linguistics, San Diego, CA, June 2016. [ACL Anthology].
Zhang,S, Caro, R, Serra,X. Predicting pairwise pitch contour relations based on linguistic tone information in Beijing opera singing. Proceedings of the 16th International Society for Music Information Retrieval (ISMIR) conference_, Malaga, Spain, October 26th-30th, 2015. [mtg]
Zhang, S. Analyze linguistic tone patterns using time-series data mining techniques. Workshop on Computational Phonology and Morphology (CompMorPhon15), July 11, 2015, Linguistics Summer Institute (Big Data), University of Chicago.
Zhang, S. Modeling Frequency Effects in Mandarin Zero-onset Variation. New Ways of Analyzing Variation (NWAV 43) annual conference, Chicago, October 23-26, 2014[html].
Zhang, S, Caro, R, Serra,X,. Study of the similarity between linguistic tones and melodic pitch contours in Beijing Opera singing. Proceedings of The 15th International Society for Music Information Retrieval (ISMIR) Conference, pp.345-348. Taiwan, October, 27-31 2014. [pdf final version][mtg]
Zhang,S. Symposium: New Research in Asian and American Music, ACMR Newsletter, v18, n1, June 2012. (newsletter article)
Zhang, S. Speech-to-song Illusion in MC: Acoustic Parameter vs. Perception. Society for Music Perception and Cognition bi-annual Meeting, Rochester, NY, Aug11-14,2011.
Zhang, S. Charles Seeger and the Study of Music Semiotics. Symposium: New Research in Asian and American Music (Session Chair), University of Pittsburgh, Mar.24, 2012.
Zhang, S. Music and Language: Brain Modularity, Domain Specificity, and Sharing with General Cognitive Capacity. Phonetic Association of China/Linguistic Association of China 6th Symposium on Phonetics and Language Acquisition, Beijing, China, May 2011.
Zhang, S. Music and Cognitive Linguistics. The 8th China International Forum on Cognitive Linguistics, Beijing University of Astronautics and Aeronautics, October, 2010 [CIFCL2010]).[pdf-long version]
Zhang, S. Speech-to-Song Illusion: Evidence from MC. Sino-European Winter School of Logic, Language, and Computation ([SELLC2010SS], hosted by Dept of Mathematics, University of Helsinki, Finland & Institute of Logic, Sun Yat-sen University, Guangzhou, China, Dec., 2010.
2022 - Hearing augmentation and wearable system with localized feedback(Bose)
2021 - Spatialized virtual personal assistent (Bose)
2020 - Systems and methods for augmented reality content harvesting and information extraction (Bose)
2023 - Tufts University (Data Science for Urban Sustainability)
2022 - Tufts University (Data Science for Urban Sustainability)
2021 - REWORK Deep Learning Summit - NLP and Conversation AI
2020 - Artificial Intelligence Festival APAC
2020 - NLP4MusA Workshop @ ISMIR 2020
2020 - Artificial Intelligence Festival by AI Accelerator Institute
2019 - AI Accelerator Summit Boston
2019 - Global AI Conference Boston
2019 - Harvard University
2019 - RE:WORK Deep Learning Summit Boston
2019 - Tufts University (Data Science for Urban Sustainability)
2018 - University of Washington (CLMS + Linguistics Department Colloquim)
2018 - Bose
2018 - Spotify
2016 - NLP in MIR tutorial at ISMIR 16 NYC
2015 - CompMusic visits China (with Xavier Serra, Rafael Caro), including 10+ institutions in Beijing and Shanghai, such as Peking University, Tsinghua University, Fudan University, Communication University of China, Ningbo University, China Academy of Social Sciences, Shanghai Conservatory, China Conservatory, Central Conservatory, Tencent, Douban, Dolby Labs, etc.
2010 - Workshops / invited talks on Chinese music, music theory at Carnegie Library of Pittsburgh, Duquesne University, University of Pittsburgh, Indiana University of Penn, TWCCO Singapore, etc.
EURASIP Journal on Audio, Speech, and Music Processing
EUSIPCO2023, NLP4DH2023, DCASE2023, ICASSP2023, EMNLP2022, IEEE-MMSP2022, ISMIR2022, DCASE2022, NLP4DH2022, ACM-MultiMedia2022, ICASSP2022, ARR2022(ACL/NAACL), NLP4DH2021(ICON), ACM-Multimedia-Asia2021, IEEE-MMSP2021, NLP4MUSA2021, ISMIR2021, ACM-MultiMedia2021, EUSIPCO2021, EMNLP2021, ICASSP2021, ACL2021, NAACL2021, EACL2021, DCASE2020, EMNLP2020, EUSIPCO2020, DCASE2019, ACL2020, ISMIR2020, ACL 2019, NAACL2019, ECNLP2019(WWW), ECNLP2020(WWW), ECNLP2020(ACL), NLP4MusA2020(ISMIR), ACAL2014, MASCSLL2017
AI Accelerator Institute Ambassador
Mentor, FourthBrain.ai - a startup backed by Andrew Ng funds
Co-Chair of Industrial Liaisons, DCASE Workshop (2021-present)
click on a project name to learn more
Computational Models for the discovery of world music. Funded by European Research Council, PI: Xavier Serra. Music Technology Group (MTG), Universitat Pompeu Fabra (UPF). Resulted in publications listed here. Contribution: 2013-17.
ANNIS is originally Annotation of Information Structure at Humboldt Universitat zu Berlin, Germany. It is a web application for search and visualization of linguistic corpora, funded by German Research Foundation (DFG), etc. Contribution: 2013-17.
XRENNER is eXternally configurable REference and Non Named Entity Recognizer, developed by Amir Zeldes with my contributions (see Zeldes and Zhang 2016 paper at CORBON workshop at NAACL 2016). See website for a live demo of coreference and entity resolution (including both non-named and named entities.) Contribution: 2015-16.
Zeldes,A, Zhang, S. 2016. When Schemas Change Rules Help : A Configurable Approach to Coreference beyond OntoNotes. Proceedings of CORBON Workshop at NAACL 2016.
Web application for collaborative annotation projects with version control using Gitbub as backend, funded by US National Endowments for the Humanities (NEH), etc., and part of Coptic Scriptorium. GitDox is on the list of awesome-nlp tools starred by 10.1K developers on GitHub. Contribution: 2016.
Zhang, S., Zeldes, A. 2017. GitDOX: A Linked Version Controlled Online XML Editor for Manuscript Transcription. Proceedings of FLAIRS30.
Annotated syllabic level segmentation, phonetic, melodic and linguistic tone information for a set of arias in Beijing oepra under CompMusic Project.
Zhang,S, Caro, R, Serra,X. 2015. Predicting pairwise pitch contour relations based on linguistic tone information in Beijing opera singing. Proc. of ISMIR 15.
Zhang, S, Caro, R, Serra,X,. 2014. Study of the similarity between linguistic tones and melodic pitch contours in Beijing Opera singing. Proc. ISMIR 14.
Comprehensive collection of lyrics of arias acquired through crawling online jingju lyrics database xikao, under CompMusic Project.
Zhang,S., Caro,R, Serra,X. 2017. Understanding the expressive functions of jingju metrical patterns through lyrics text mining. Proc. ISMIR 17.
ANNotation of Information Structure.
Thomas Krause; Weißenfels Benjamin; Tom R.; IrinaGlushanok; Martin Klotz; Shuo Zhang; Luke Gessler; Amir Zeldes; fab-bar; Stephan Druskat; adrianeboyd; egon w. stemle; Thomas N; Lari Lampen; Florian Petran (2019, October 18). korpling/ANNIS beta.3 (Version beta.3). Zenodo. http://doi.org/10.5281/zenodo.3507129
Multimodal Metaphor Corpus (collaboration with University of Helsinki).
Alnajjar,K, Hämäläinen,M, Zhang,S. Ring That Bell: A Corpus and Method for Multimodal Metaphor Detection in Videos. Proceedings of the Third Workshop on Figurative Language Processing (FigLang) at EMNLP 2022. Zenodo.