All Words
Exact Phrase
Title Search Only
advanced search
Digital Archives Initiative
Memorial University - Electronic Theses and Dissertations 4
Anthropology
Aquaculture
Archaeology
Biochemistry
Biology
Biopsychology
Chemistry
Classics
Community Health
Computational Science
Computer Science
Counselling Centre
Earth Sciences
Economics
Education
Educational Administration
Educational Psychology
Engineering
English
Environmental Science
Folklore
French and Spanish
Geography
German and Russian
History
Human Kinetics and Recreation
Linguistics
Marine Studies
Mathematics and Statistics
Medicine
Nursing
Pharmacy
Philosophy
Physics and Physical Oceanography
Political Science
Psychology
Religious Studies
Social Work
Sociology
Toxicology
Women's Studies
home
browse
preferences
my favorites
about/feedback
recent uploads
help/search tips
Français
menu off
add document to favorites
:
add page to favorites
:
reference url
back to results
:
previous
:
next
Search this object:
0
hit(s) ::
previous hit
:
next hit
View:
document description
page description
page & text
previous page
:
next page
Document Description
Title
A
bottom-up
approach
for
XML
document
classification
Author
Wu
,
Junwei.
Description
Thesis
(M.Sc.)--Memorial
University
of
Newfoundland
,
2009.
Computer
Science
Date
2009
Pagination
viii, 64 leaves : ill.
Subject
Data
mining;
XML
(Document
markup
language)--Classification;
Degree
M.Sc.
Degree Grantor
Memorial University of Newfoundland. Dept. of Computer Science
Discipline
Computer Science
Language
Eng
Notes
Includes
bibliographical
references
(leaves
61-64)
Abstract
Extensible
Markup
Language
(XML)
is
a
simple
and
flexible
text
format
derived
from
Standard
Generalized
Markup
Language
(SGML)
[1].
It
has been
widely
accepted
as a
crucial
component
of
many
information
retrieval
related
applications
,
such
as
XML
databases
,
web
services
,
etc.
One
of the
reasons
for its
wide
acceptance
is
its
customized
format
during
data
transmission
or
storage.
Classification
is
an
important
data
mining
task
that
aims
to
assign
unknown
objects
to
classes
that
best
characterize
them.
In this
thesis
,
we
propose
a
method
to
classify
XML
documents
under
the
assumption
that they
do
not have a
common
schema
that
may
or
may
not be
available
,
which
is
closer
to the
real
cases.
Our
method
is
similarity-based.
Its
main
characteristic
is
its
way
to
handle
the
roles
played
by
texts
and the
structural
information.
Unlike
most
existing
methods
,
we
use
a
bottom-up
approach
,
i.e.
,
we
start
from the
text
first
, and then
embed
the
structural
information.
This
is
based
on the
observation
that in
XML
documents
with
diversified
tag
structures
, the
most
informative
information
is
carried
by the
terms
in the
texts.
Our
experiments
show
that this
strategy
can
achieve
a
better
performance
than the
existing
methods
for
documents
from
sources
that
exhibit
heterogeneous
structures.
Type
Text
Resource Type
Electronic
thesis
or
dissertation
Format
Image/jpeg;
Application/pdf
Source
Paper copy kept in the Centre for Newfoundland Studies, Memorial University Libraries
Local Identifier
a3243873
Rights
The author retains copyright ownership and moral rights in this thesis. Neither the thesis nor substantial extracts from it may be printed or otherwise reproduced without the author's permission.
Collection
Electronic
Theses
and
Dissertations
Scanning Status
Completed
PDF File
(8.26
MB)
--
http://collections.mun.ca/PDFs/theses/Wu_Junwei.pdf
CONTENTdm file name
41293.cpd