Author Identification on Data Set Scale Size : A Systematic Survey

Author identification is a technique for identifying author of anonymous e-text. It has near about 130 year?s long history, started with the work by Mendenhall 1987. Author identification is one of the applications of text mining. This paper presents a survey on present techniques for identifying author of anonymous text document. We outline this survey based on data set scale size used for experimentation as a training data. We have categorized data set size as large scale data size like essays, novels etc. and small scale data size like tweets and blog posts etc. An evaluation of Author identification techniques are analyzed based on accuracy of the Author identification and size of data set applied in training phase. At the end we conclude this paper with observations and future work in this area.

Title Author Identification on Data Set Scale Size : A Systematic Survey
Pages 14
Sponsors Universiti Kebangsaan MalaysiaUniversiti Kebangsaan Malaysia
Publisher Canadian Arena of Applied Scientific Research Ltd | Canada
ISBN 978-0-9948937-4-1
DOI http://dx.doi.org/10.18797/caasr/2ndiciet/iccse/2016/05/05/11
Conference CAASR Second International Conference on Innovative Engineering and Technologies & CAASR International Conference on Civil and Structural Engineering

Table of Contents
PDF

The interference effect between adjacent buildings are considerable in assessing the wind load on structures. This article discusses the study carried out to understand the effect of proximity of the interfering building (nearby building) on the wind loading of the principal building (building under study). The evaluation is based on three d

PDF

This paper highlights Steel-Concrete and Profiled Steel Sheet Dry Board (PSSDB) composite construction systems that was researched widely by the authors. The paper begins with explaining briefly the principle of composite construction, followed by an account of the developments that have taken place in the past. Research findings on composit

PDF

Opinion mining is one of the emerging areas of research in today?s world of internet. An opinion is the private state of an individual, and as such, it represents the individual?s ideas, beliefs, assessments, judgments and evaluations about a specific subject or item. Opinions can be collected from the e-commerce websites such as Flipkart, A

PDF

In Next generation networks the Wireless Multimedia Sensor Network will enhance the capability of the future internet in terms of performance and speed. Managing traffic in the network is a key factor for future internet. The purpose of this paper is to study various scheduling strategies for traffic classification and efficient algorithm fo

PDF

Internet plays a significant role in communication by way of information exchange through social media. Communication through social media fails to provide two main aspects, security level and authentication of the users being used. The dearth of authentication produces a space for the attacker to eject malicious traffic with fake identity.

PDF

Almost 80% global digital data are present in unstructured format. These voluminous data require analysis to get the useful hidden meaning and sense from these data. Text classification is considered as a most popular and efficient method to manage and process these data that are widespread and increasing exponentially. Text classification a

PDF

The comb-like histogram of DCT coefficients on each sub band is one of the main fingerprints of the JPEG compressed image. An anti-forensic method proposed by Stamm and Liu, uses the dithering technique to remove this fingerprint. Dithering technique involves addition of properly designed noise to each DCT coefficient. Due to this the qualit

PDF

As world connecting tremendously towards social networks sites and on line shopping applications which defined as a Big Data world. To processes heterogeneous, high speed and high volume data, many techniques have been proposed and implemented. Big data clustering is one of the efficacious method for data analytics. In the age Big Data, Reco

PDF

This paper presents a detailed study of the energy consumption of the different Java Collection Framework (JFC) implementations. For each method of an implementation in this framework, we present its energy consumption when handling different amounts of data. Knowing the greenest methods for each implementation, we present an energy optimiza

PDF

Author identification is a technique for identifying author of anonymous e-text. It has near about 130 year?s long history, started with the work by Mendenhall 1987. Author identification is one of the applications of text mining. This paper presents a survey on present techniques for identifying author of anonymous text document. We outline

PDF

The concrete prepared using geopolymer technology is eco-friendly and could be considered as a part of the sustainable development. This article summarizes the study carried out to understand the effect of geopolymer concrete (GPC) in an exterior beam-column joint subjected to cyclic loading with different molarities (10, 12, 14, and 16 M).

PDF

Under Water Acoustic (UWA) communications channel is considered as one of the most challenging transmission mediums as a band-limited and time variant channel that exhibits severe path loss, ambient noise, multipath and Doppler effects which affect the Bit Error Rate (BER) of the transmitted data and the channel spectral efficiency and limit

PDF

Author Identification is method of finding the author of anonymous document. To find writing style of an author, feature plays vital role in the process. This paper focuses on role of feature in such process, how feature impacts on the process is discussed in the paper. Feature selection in process depends on writing style which is followed

PDF

?No quality data, no quality results?. Preprocessing is most important task in Sentiment Analysis as it passes data to further stages. It focuses on transforming data into a form which can be easily & effectively used as input in many domains. In this paper, we have introduced novel algorithm for the preprocessing of the text using distr

PDF

Diabetes Mellitus is a serious bane in the urban and rural communities through the world. India has an increasing population of diabetics with 65.1 million in 2015 from 50.8 million in 2010. Diabetes in India takes a death toll of 25.4% per 1 lack population and likely to become much higher because of the unawareness prominent in rural and u

PDF

Association rule mining is an important concept in data mining which finds the interesting relationship that exist between the items. But the Association rule mining does not considers the weight or profit of an item; it considers only the frequency of the item in a database. High utility itemset considers the number of item in a transaction

PDF

Autism is a complex neurobehavioral disorder that includes impairments in social interaction, developmental language and communication skills combined with rigid, repetitive behaviors. But due to unawareness among the public it is more known as a diseases than a disorder. In order to overcome this ignorance among the public and to diagnose f

PDF

This paper presents the results of an experimental investigation conducted to characterize the joint strengths, and failure modes in adhesively bonded single-lap glass fiber-reinforced polymer (GFRP) joints. The joints were composed of GFRP adherents having the same stiffness as the members. The design parameters investigated in the study we

PDF

Due to enhanced use of wireless devices, effective routing techniques have become a topic of interest by researchers. Most popular dynamic routing protocols have their own limitations on Quality-of-Service (QoS) parameters like reliability, security, cost effectiveness, timely delivery etc. We propose a particle swarm optimization (PSO) algo

PDF

Cloud storage system are consisting of storage servers with long-term storage services over the World on the internet. Storing data in a user?s cloud system cause serious concern over data confidentiality. A general sharing scheme protects data confidentiality, but also restricts the functionality of the cloud data storage system because at

PDF

Cloud Computing allows customers to access cloud resources and services. On-demand, self-service and pay-by-use business model are adapted from the cloud resource sharing process. Service level agreements (SLA) regulate the cost of the services that are provided for the customers. Cloud data centers are employed to share data values to the u

PDF

Localized multi-objective quality of service (QoS) routing protocol for WSN applications have different types of data traffic. For link quality, employed distributed, memory and computation efficient mechanisms and for reliability, multi-sink single-path approach was used. Though QoS aware dynamic routing for data integrity was ensured in WS

PDF

Routing in wireless sensor network require careful investigation on the sensor node location and the path of communication. The sensor nodes are located by routing protocol defined in the wireless sensor network. Efficient data collection strategies, significantly improves the throughput and data integrity in wireless sensor network communic

PDF

Diabetes is fast gaining the status of a potential epidemic in India with more than 62 million diabetic individuals currently diagnosed with the disease. It is also a growing issue throughout the world. When medical history of a patient is available, it is very much possible to predict diabetes and its trends and hence provide due medical as

PDF

Low power lossy networks are going to be integral part of Internet of Things. Traditional routing protocols are not going to work in IoT environment because of resource scarcity present in majority of the connected devices. The communication range of these devices is short and the memory and processing power is also limited. It creates a nee