De-duping URLs with Sequence-to-Sequence Neural Networks

Published in SIGIR 2017 Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017

Paper | Project Page Traditional de-duping methods are usually limited to heavily engineered rule matching strategies. In this work, we propose a novel URL de-duping framework based on sequence-to-sequence (Seq2Seq) neural networks. A