File:Topic detection in online chat (IA topicdetectionin109454513).pdf

Go to page
next page →
next page →
next page →

Original file(1,275 × 1,650 pixels, file size: 624 KB, MIME type: application/pdf, 104 pages)

Captions

Captions

Add a one-line explanation of what this file represents

Summary edit

Topic detection in online chat   (Wikidata search (Cirrus search) Wikidata query (SPARQL)  Create new Wikidata item based on this file)
Author
Durham, Jonathan S.
image of artwork listed in title parameter on this page
Title
Topic detection in online chat
Publisher
Monterey, California. Naval Postgraduate School
Description

The ubiquity of Internet chat applications has benefited many different segments of society. It also creates opportunities for criminal enterprise, terrorism, and espionage. This thesis proposes statistical Natural Language Processing (NLP) methods for creating systems that would detect the topic of chat in support of larger NLP goals such as information retrieval, text classification and illicit activity detection. We propose a novel method for determining the topic of chat discourse. We trained Latent Dirichlet Allocation (LDA) models on source documents and then used inferred topic distributions as feature vectors for a Support Vector Machine (SVM) classification system. We constructed LDA models in three ways: We considered the collective posts of authors as documents, hypothesizing that we could detect the topic physics given only one side of the conversation. The resultant classifiers obtained F-scores of 0.906. Next, we considered individual posts as documents, hypothesizing we could detect physics posts. The resultant classifiers obtained F-scores of 0.481. Finally, we considered physics textbook paragraphs as documents, hypothesizing that we could determine the topic of an author or a post based on an LDA model created from a textbook and a sample of noisy chat. The resultant classifiers obtained F-scores of 0.848 and 0.536 respectively.


Subjects: Internet
Language English
Publication date September 2009
Current location
IA Collections: navalpostgraduateschoollibrary; fedlink
Accession number
topicdetectionin109454513
Source
Internet Archive identifier: topicdetectionin109454513
https://archive.org/download/topicdetectionin109454513/topicdetectionin109454513.pdf

Licensing edit

Public domain
This work is in the public domain in the United States because it is a work prepared by an officer or employee of the United States Government as part of that person’s official duties under the terms of Title 17, Chapter 1, Section 105 of the US Code. Note: This only applies to original works of the Federal Government and not to the work of any individual U.S. state, territory, commonwealth, county, municipality, or any other subdivision. This template also does not apply to postage stamp designs published by the United States Postal Service since 1978. (See § 313.6(C)(1) of Compendium of U.S. Copyright Office Practices). It also does not apply to certain US coins; see The US Mint Terms of Use.

File history

Click on a date/time to view the file as it appeared at that time.

Date/TimeThumbnailDimensionsUserComment
current13:02, 25 July 2020Thumbnail for version as of 13:02, 25 July 20201,275 × 1,650, 104 pages (624 KB) (talk | contribs)FEDLINK - United States Federal Collection topicdetectionin109454513 (User talk:Fæ/IA books#Fork8) (batch 1993-2020 #30745)

Metadata