A Classifier to Distinguish between Cypriot Greek and Standard Modern Greek

Hanna Sababa, Athena Stassopoulou

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The problem of discriminating between similar languages and dialects is one of the current challenges of natural language processing. In this paper, we describe the collection of a bidialectal corpus of Greek and the construction of a classifier to distinguish between Cypriot Greek (CG) and Standard Modern Greek (SMG). The corpus of CG and SMG was compiled from social media websites such as Facebook, Twitter and online forums. N-gram features were extracted and three classification algorithms were applied and tested on labeled sentences: multinomial naive Bayes (NB), linear support vector classifier (SVC) and logistic regression. All algorithms classified the test data with an accuracy of over 90%, with the multinomial NB classifier performing best, yielding a mean accuracy of 95%. This study adds to the existing body of work on the problem of discriminating between similar languages and is the first to examine CG and SMG. The results demonstrate the feasibility of an accurate Greek dialect classifier for academic or applied purposes.

Original languageEnglish
Title of host publication2018 5th International Conference on Social Networks Analysis, Management and Security, SNAMS 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages251-255
Number of pages5
ISBN (Electronic)9781538695883
DOIs
Publication statusPublished - 30 Nov 2018
Event5th International Conference on Social Networks Analysis, Management and Security, SNAMS 2018 - Valencia, Spain
Duration: 15 Oct 201818 Oct 2018

Publication series

Name2018 5th International Conference on Social Networks Analysis, Management and Security, SNAMS 2018

Conference

Conference5th International Conference on Social Networks Analysis, Management and Security, SNAMS 2018
Country/TerritorySpain
CityValencia
Period15/10/1818/10/18

Keywords

  • feature extraction
  • machine learning
  • natural language processing
  • natural languages
  • statistical learning

Fingerprint

Dive into the research topics of 'A Classifier to Distinguish between Cypriot Greek and Standard Modern Greek'. Together they form a unique fingerprint.

Cite this