Text Mining

Principais Livros e Artigos

FLORESCU, Daniela. Managing Semi-Structured Data. Queue, New York, v. 3, nº 8, p. 18-24, oct. 2005. [Dá um panorama geral sobre dados semi-estruturados e suas dificuldades, apontando necessidades futuras.]
McCALLUM, Andrew. Information Extraction: Distilling Structured Data from Unstructured Text. Queue, New York, v. 3, nº 9, p. 48-57, nov. 2005. [Técnicas e ferramentas para extração de informações de documentos textuais da Web.]
NOY, Natalya. Order from Chaos. Queue, New York, v. 3, nº 8, p. 42-49, oct. 2005. [Discute ontologias e seus desafios, bem como apresenta uma plataforma popular para representação (Protegé).]
SHAHAF, Dafna et al. Information Cartography. Communications of the ACM, New York, v. 58, nº 11, p. 62-73, nov. 2015. [Discute a criação e o uso de "mapas de metrô" para obter listas de informações relacionadas.]

Outros Livros e Artigos Interessantes

ANTHES, Gary. Topic Models vs. Unstructured Data. Communications of the ACM, New York, v. 53, nº 12, p. 16-18, dec. 2010.
ARASU, Arvind; GARCIA-MOLINA, Hector. Extracting Structured Data from Web Pages. In: ACM SIGMOD International Conference on Management of Data, 2003, San Diego, CA. Proceedings... New York: ACM Press, 2003. p. 337-348.
BLEI, David M. Probabilistic Topic Models. Communications of the ACM, New York, v. 55, nº 4, p. 77-84, apr. 2012.
BOSWORTH, Adam. Learning from the Web. Queue, New York, v. 3, nº 8, p. 26-32, oct. 2005. [Discute as lições aprendidas na publicação e recuperação de grandes quantidades de informação semi-estruturada na Web.]
CRONAN, Timothy P.; FOLTZ, C. Bryan; JONES, Thomas W. Piracy, Computer Crime and IS Misuse at the University. Communications of the ACM, New York, v. 49, nº 6, p. 84-90, jun. 2006.
DÖRRE, Jochen; GERSTL, Peter; SEIFFERT, Roland. Text Mining: Finding Nuggets in Mountains of Textual Data. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 5, 1999, San Diego, CA. Proceedings... New York: ACM Press, 1999. p. 398-401.
FAN, Weiguo et al. Tapping the Power of the Text Mining. Communications of the ACM, New York, v. 49, nº 9, p. 76-82, sep. 2006. [Discute aplicações potenciais e atuais de Text Mining.]
GREGG, Dawn G.; WALCZAK, Steven. Adaptive Web Information Extraction. Communications of the ACM, New York, v. 49, nº 5, p. 78-84, may 2006. [Apresenta um protótipo que utiliza técnicas para extração dinâmica e adaptativa de informações de páginas HTML.]
HALEVY, Alon. Why Your Data Won’t Mix. Queue, New York, v. 3, nº 8, p. 50-58, oct. 2005.
HANG, Li. Language Models: Past, Present and Future. Communications of the ACM, New York, v. 65, nº 7, p. 56-63, Jul. 2022. DOI 10.1145/3490443.
HEARST, Marti A. Untangling Text Data Mining. In: Conference on Association for Computational Linguistics, 37, 1999, College Park, MA. Proceedings... [s.l.]: Association for Computational Linguistics, 1999. p. 3-10.
KROEZE, Jan H.; MATTHEE, Machdel C.; BOTHMA, Theo J. D. Differentiating Data- and Text-Mining Terminology. In: Annual Research Conference on Enablement through Technology, 2003, [África do Sul]. Proceedings... [s.l.]: South African Institute for Computer Scientists and Information Technologists, 2003, p. 93-101.
KUECHLER, William L. Business Applications of Unstructured Text. Communications of the ACM, New York, v. 50, nº 10, p. 86-93, oct. 2007. [Visão geral da arquitetura das aplicações de Text Mining.]
LAENDER, Alberto H. F. et al. Surveys: A Brief Survey on Web Data Extracting Tools. ACM SIGMOD Record, [s.l.], v. 31, nº 2, p. 84-93, jun. 2002.
LI, Jiexun; ZHENG, Rong; CHEN, Hsinchun. From Fingerprint to Writeprint. Communications of the ACM, New York, v. 49, nº 4, p. 76-82, apr. 2006. [Discute formas de identificar o autor de um texto a partir de estatísticas diversas.]
LOPES, Maria C. S. Mineração de Dados Textuais Utilizando Técnicas de Clustering para o Idioma Português. 2004. Tese (Doutorado em Engenharia Civil) - COPPE, Universidade Federal do Rio de Janeiro, Rio de Janeiro.
MATSUIDARA, Kate. Capturing and Structuring Data Mined from the Web. Communications of the ACM, New York, v. 57, nº 3, p. 10-11, mar. 2014.
ROUSSINOV, Dmitri et al. Beyond Keywords: Automated Question Answering on the Web. Communications of the ACM, New York, v. 51, nº 9, p. 60-65, sep. 2008. [Propõe a expansão dos métodos de pesquisa baseados na busca por palavras para a análise de perguntas em linguagem natural.]
SPERBERG-McQUEEN, C. M. XML and Semi-Structured Data. Queue, New York, v. 3, nº 8, p. 34-41, oct. 2005. [Discute a filosofia empregada pela XML para tratar dados que não obedecem à tradicional estrutura relacional.]
SUVER, Chris. The Cost of Data. Queue, New York, v. 3, nº 8, p. 62-64, oct. 2005. [Defende que a expressão ”dados semi-estruturados“ é enganosa, pois o problema é que os dados não são organizados de forma precisa em virtude de uma análise de custo-benefício.]
TANG, Ruixiang et al. The Science of Detecting LLM-Generated Text. Communications of the ACM, New York, v. 67, nº 4, p. 50-59, Apr. 2024. DOI 10.1145/3624725.
VORHEES, Ellen M. TREC: Continuing Information Retrieval's Tradition of Experimentation. Communications of the ACM, New York, v. 50, nº 11, p. 51-54, nov. 2007. [Descreve TREC, uma iniciativa para estabelecer um benchmark para avaliação de algoritmos de recuperação de informações.]
ZAÏANE, Osmar R.; ANTONIE; Maria-Luiza. Classifying Text Documents by Associating Terms with Text Categories. In: Australasian Conference on Database Technologies, 13, 2002, Melbourne, Australia. Proceedings... [s.l.]: Australian Computer Society, 2002, p. 215-222.
ZHANG, Ce et al. DeepDive: Declarative Knowledge Base Construction. Communications of the ACM, New York, v. 60, nº 5, p. 93-101, may 2017.

Página do Ricardo

Última atualização:

Veja Também:

Text Mining

Principais Livros e Artigos

Outros Livros e Artigos Interessantes