About Me

My name is Zhengzhong Liu (People often refer me with my nickname Hector). I am a Research Scientist in Petuum, and a PhD researcher on Natural Language Processing and Computational Linguistics at Carnegie Mellon University, working with Professor Teruko Mitamura and Professor Eduard Hovy. I also work closely with Professor Eric Xing on the CASL project. Prior to my PhD, I obtained my master at CMU with Teruko and Ed. And prior to that, I obtained my bachelor degree in the Department of Computing from the Hong Kong Polytechnic University, working with Professor Qin Lu.

News

Recently, we have combined these projects together with other OSS projects and launched the CASL (Composible, Auotmatic, and Scalable ML) open source effort, led by Professor Eric Xing, to provide a unified umbrella for a Production and Industrial AI Platform.

Research

I believe that understanding linguistic problems would allow one to apply proper computational methods solve them. During my PhD, I focus on modeling event semantics, which includes event detection (TAC’17, TAC’15), anaphora resolution (NAACL’16, NAACL’16, LREC’14), script modeling (COLING’18) and salience modeling (EMNLP’18).

I am broadly interested in solving problems by combining machine learning techniques with linguistic insights and human knowledge. In the early days I worked on semantic web problems such as entity disambiguation(WWW’12, entity linking) or slot filling. Particularly, I love incorporating knowledge into NLP problems, such as in information retrieval (SIGIR’18,CIKM’17), or core NLP tasks (ACL’16).

Open Source

I am also a fan of developing open-source and high quality toolkits about NLP, I have recently worked on the following projects:

  1. Texar: A modularized approach for Neural Network Based text Generation and more. Texar is nominated for the best demo paper in ACL 2019.
  2. Forte: A flexible and highly composable NLP toolkit for a wide range of text applications (including IR, NLU, NLG). Checkout out our AAAI 2020 and GTC 2020 Tutorials for the design philosophy.
  3. Stave: A general purpose, modern annotation toolkit (under development).

The ASYML project is now part of the CASL project!

Community

I served as Program Committee (reviewer) for ACL 2019 (outstanding reviewer), 2020; ACL SRW 2020; NAACL 2019 (Kudos Reviewer); EMNLP 2018, 2020; NLPCC 2017, 2018, 2019, 2020; CoNLL 2019.

Thesis

My thesis proposal summarizes some of my previous work. We are proposing to deliver two solutions to two problems we observed in event research:

  1. Computational Perspective
    • Problem: Dilemma on labelled data for difficult semantic tasks. Such tasks are very difficult to be annotate because of the task complexity for human; but we at the same time require a lot of data to train such tasks because the task complexity for machines as well.
    • Proposed solution: we are proposing to bring data together, such as jointly using multiple datasets, or looking for incidental supervision signals.
  2. Linguistic Perspective
    • Problem: Common approaches on analyzing event relations typically assumes the shared frame elements are identical, this is not always the case.
    • Proposed solution: We adopt the quasi-identity theory on coreference and propose to solve the problem with a slightly different framework.