DOKK Library

Web Service to Retrieve and Semantically Enrich Datasets for Theses From Open Educational Repositories

Authors Delia Arrieta Díaz Eduardo Lopez Dominguez Ismael Everardo Bárcenas Patiño Jorge de la Calleja Mora María Auxilio Medina Nieto Mireya Tovar Vidal Paulo Daniel Vázquez Mora

License CC-BY-4.0

Plaintext
Received August 17, 2020, accepted September 7, 2020, date of publication September 18, 2020,
date of current version September 30, 2020.
Digital Object Identifier 10.1109/ACCESS.2020.3024614




Web Service to Retrieve and Semantically Enrich
Datasets for Theses From Open
Educational Repositories
MARÍA AUXILIO MEDINA NIETO 1 , PAULO DANIEL VÁZQUEZ MORA2 ,
JORGE DE LA CALLEJA MORA 1 , MIREYA TOVAR VIDAL3 ,
EDUARDO LÓPEZ DOMÍNGUEZ 4 , (Member, IEEE),
DELIA ARRIETA DÍAZ5 , AND
ISMAEL EVERARDO BÁRCENAS PATIÑO6
1 Postgraduate Department, Polytechnic University of Puebla, Puebla 72640, Mexico
2 Information Technologies Department, Technological University of Puebla, Puebla 72300, Mexico
3 Faculty of Computer Science, Benemerita Universidad Autonoma de Puebla, Puebla 72570, Mexico
4 Department of Computer Science, Laboratorio Nacional de Informática Avanzada (LANIA), Xalapa 91090, Mexico
5 Faculty of Economics, Accountability and Management, Juarez University of the State of Durango, Durango 34000, Mexico
6 Computing Engineering Department, National University of Mexico, Coyoacan 04510, Mexico

Corresponding author: María Auxilio Medina Nieto (maria.medina@uppuebla.edu.mx)



  ABSTRACT The paper describes the design and implementation of a semantic web service that retrieves
  theses and extends the keyword based-search of a DSpace repository taking into account the roles of advisors
  and steering committee members formally represented into a custom-made ontology. The service uses
  SPARQL queries and the serialization module of RDF DSpace, this links the item submission process and the
  ontology, thus the more theses are added into a repository, the more instances are inserted into the ontology.
  The paper provides empirical insights about how to reuse theses metadata and includes the results of an
  exploratory and self-management survey of usability heuristic evaluation of a web site that enables to access
  the proposed service. Heuristics were estimated with a purposive sample of students, teachers, and managers,
  the results indicated a high satisfaction level and showed that the service increased theses accessibility in the
  web environment. The service also generates semantically enriched datasets that coexist with the repository,
  they are of utility and value to educational organizations as they give institutional visibility.


  INDEX TERMS Educational technology, semantic web, web services, information retrieval.


I. INTRODUCTION                                                                                impact of these organizations. Detailed information about this
Scientific and academic communities use repositories, they                                     protocol is described in [2].
are technological platforms designed to store and preserve                                        The impact of OERs is known and reported in the
digital documents, a dissemination medium (green path)                                         literature and databases such as the Directory of Open
or a medium to publish the produced contents (golden                                           Access Repositories (OpenDOAR) [3], where the percent-
path); [1]. An open educational repository (OER) or institu-                                   age of technological platforms for OERs is as follows:
tional repository (IR) is a set of services offered by educa-                                  DSpace (40%) [7], EPrints (11%) [6], WEKO (8%), this
tional organizations rendered to the community to gather and                                   platform has been developed by the National Institute of
manage digital documents of any type through the creation                                      Informatics (NII), Japan (more information about WEKO
of an open, interoperable, and organized collections that use                                  is available at: https://weko.wou.edu.my), Digi-
the Open Archives Initiative Protocol for Metadata Harvest-                                    tal Commons (5%), a cloud-hosted institutional repository
ing (OAI-PMH protocol) intending to ensure visibility and                                      software (https://bepress.com/products/di-
                                                                                               gital-commons/), islandora (3%), a free open-source
                                                                                               software framework designed to manage and discover digital
    The associate editor coordinating the review of this manuscript and                        assets (https://islandora.ca), CONTENTdm (2%),
approving it for publication was Farhana Jabeen Jabeen             .                           a ‘‘software as a service’’ (SaaS) platform to manage digital

                     This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 8, 2020                                                                                                                                                        171933
                                                                      M. A. Medina Nieto et al.: Web Service to Retrieve and Semantically Enrich Datasets




            FIGURE 1. Context for SW001 web service.




collections (https://www.oclc.org/en/content-                              example, what kind of information is expected for the contrib-
dm.html), Open Publication Systems (OPUS) (2%),                            utor element and 3) the interpretation of exported data from
an open-source software package to create reposito-                        repositories is subject to final users and this depends on each
ries that are compliant with the OAI-PMH protocol,                         educational organization.
(https://oapus.nl), Hyper Articles en Ligne (HAL)                             Previously, at the Polytechnic University of Puebla
(1%), an open archive where authors upload scholarly                       (UPPue), an ontology called Onto4AIR had been designed
documents (http://hal.inria.fr), dLibra software                           to formally represent domain and operative knowledge for
(1%), a software used to build repositories of digital obj-                Mexican OERs focusing on documents, users, and their rela-
ects (https://dingo.psnc.pl/en/dlibra-en/                                  tionships, [9] and a newer version called Onto4UPPue is
dlibra-funkcje-en/), while the remaining percentage                        used for the semantic web service described in this paper,
correspond to tailored software. These platforms have specia-              from now on, called SW001. This service has been designed
lized Graphical User Interfaces (GUIs), information retrieval              to tackle the previous problematic and extends the key-
services and the majority use relational models to store                   word based-search taking into account the roles of advisors,
metadata [4], [5].                                                         other steering committee members, and semantic information
   EPrints [6] and DSpace [7] have an open license, so they                stored in Onto4UPPue ontology and generates semantically
are widely used in libraries, universities, and cultural heritage          enriched datasets that are downloaded for further analysis.
organizations, both support the management of digital docu-                The definition of the ontology concept is adopted from [10].
ments such as master and Ph.D. theses. In Mexico, from the                 Figure 1 illustrates the context of SW001.
date of writing this paper, the National Repository [8] collect               The feasibility of the service is justified by applying
data from 105 open repositories, more than the 90% of them                 and testing it in the repository of the UPPue university
use a version of DSpace.                                                   (UPPUe-IR) that is supported in the 6.2 version of DSpace,
   From the authors’ point of view, this paper identifies the              the adaptation to other DSpace repositories will be possible
following problems related with the management of theses:                  with simple updates. The paper contributions lie in the design
1) information retrieval only distinguish between the creator              of a web site that allows users to access SW001 and a web
and contributors, that is, this is not possible to establish a role        application to gather data for a heuristic usability testing.
for authors such as the first author, advisor or steering com-             We expect that the development methodology for SW001 to
mittee members, 2) there is ambiguity in the use of descriptive            be, directly or with slight modifications, reusable by OER
data (or metadata) during the item submission process, for                 managers that seek ways to exploit RDF data enriched with

171934                                                                                                                                    VOLUME 8, 2020
M. A. Medina Nieto et al.: Web Service to Retrieve and Semantically Enrich Datasets




ontologies; detailed information about RDF is presented                               common interests. The ontology obtained from the OER
in [11]. This is especially true for repositories where the                           and ontologies of other systems is integrated by means of
application of metadata schema is kept to a minimum.                                  semantic correspondence between entities.
   The paper is organized as follows. Section II contains                                In [17], authors report three methods for exporting data
related work that offers similar or alternative solutions.                            about scientific activity stored in the Current Research Infor-
Section III describes an in-detail view of SW001. Section IV                          mation System (CRIS) at the University of Novi Sad; this
presents the results of a heuristic usability testing. Finally,                       system implements the Common European Research Infor-
we conclude in Section V with a summary of the current work                           mation Format (CERIF). One of these methods involves the
along with further research perspectives.                                             OAI-PMH protocol and the OAICat library. The exported
                                                                                      data generate datasets that enable the creation of graphs that
II. RELATED WORK                                                                      show links between departments, faculties, and researchers,
Web data integration and extraction using semantic tech-                              these links are useful to find research leaders and com-
nologies are extended to many knowledge areas as is illus-                            mon research areas. Reference [18] authors propose a
trated in [12], where ecological data from the Guanabara                              user-centered approach and the information of a learning
Bay ecosystem are stored in distinct relational databases,                            management system to customize ontology web language
thus there is the heterogeneity, lack of metadata stan-                               ontologies (OWL) to attend specific needs of disabled stu-
dardization, and reduced interoperability. To tackle these                            dents, ontologies are used to access, index, and retrieve
problems, a four-level architecture is proposed to integrate,                         heterogeneous items stored in a DSpace repository. More
publish and retrieve ecological data from repositories using                          information about OWL is available in [34].
linked data; data are published as RDF triples using a                                   Reference [19] describes an ontology that models students,
relational-Resource Description Format mapping language                               teachers, monographs, graduated documents student-advisor,
and an application ontology to integrate a common vocab-                              student-topic, and student-assessment committee relation-
ulary and a global view of the generated datasets. Data                               ships with a domain, range, and cardinality restrictions.
views are related to queries over data sources, they associate                        Through interviews with the staff of different universities,
mappings and query answering to represent workflows for                               the ontology integrates a set of rules that formally models the
scientists.                                                                           two stages of the student’s graduation process.
   In the domain of OERs, the use of metadata and ontolo-                                To manage a huge amount of data and web services, [35]
gies has been studied since different perspectives, for exam-                         presents a model that uses multithreading technology with
ple, [13] analyzes relationships between collection-level and                         dataset parameters to combine web services with parallel pro-
item-level metadata and proposes a general method for trans-                          cessing of computers that form a wide area network. A frame-
lating them into a set of statements in first-order logic                             work to automatic semantic web service composition that also
and formal knowledge representation languages using log-                              uses parallel execution of processes during the preprocessing
ical inference rules, while [14] proposed an agile method                             is also described in [36].
that minimizes the need of expertise when semi-structured                                In [20] scholarly resources are harvested from various fed-
data are used, they analyze ontology-based methodologies                              erated repositories, then a common learning object ontology
for integrating and reconciling information due to ontolo-                            is used for subject classification to automatically annotated
gies deal with syntactic and semantic heterogeneity. Ref-                             learning objects using keyword expansion, inferences, and
erence [15] proposes a standards-compliant approach than                              standard taxonomic vocabularies expressed in SKOS. Ref-
involves a set of mappings between domain vocabularies                                erence [21] presents a literature review about applications
to transform data in a DSpace repository into linked open                             of semantic technologies in bibliographic databases such as
datasets.                                                                             the definition of semantic models for bibliographic descrip-
   Reference [16] applied semantic searching techniques on                            tions, approaches to transform existing bibliographic data
digital repositories supported by the DSpace and their results                        into machine-readable datasets, data enrichment to facilitate
showed that this type of search is useful to expert and                               web search and information extraction as well as the construc-
novice users and that it enabled new alternatives for content                         tion of knowledge repositories.
browsing and retrieval in comparison with the keyword-based                              Finally, another application that takes into account seman-
search. Reference [4] propose an ontology-facilitated sharing                         tic search is the Arabic semantic search engine based on a
of data as an alternative to integrate different IRs metadata;                        domain-specific ontological graph for Colleges of Applied
their method consists of transforming data from relational                            Science, Sultanate of Oman (CASOnto) described in [22],
databases of OERs into ontologies that are queried by users                           this engine supports the factorial question answering and
from a unique web page. The specialization of this pro-                               uses keyword-based search and semantic-based search in
posed is reported in [5], where authors describe a system                             Arabic and English languages. A comparative study showed
that transforms the DSpace metadata database supported by                             that the second search is better than the first search in both
the DSpace into an intermediate database with a normal-                               simple and complex queries, the performance and efficiency
ized schema that is then transformed into an ontology; the                            of this engine are compared with Kngine, Wolfram Alpha,
aim is to share information with other systems to discover                            and Google.

VOLUME 8, 2020                                                                                                                                171935
                                                                    M. A. Medina Nieto et al.: Web Service to Retrieve and Semantically Enrich Datasets




           FIGURE 2. Modules of DSpace.




                      FIGURE 3. High level design for SW001 web service.




III. DESCRIPTION OF SEMANTIC WEB SERVICE                                   modules related to these requirements are showed in Figure 2;
Some behaviors and Technical Recommendations of the Con-                   it is worth to notice that the RDF module is not enabled in the
federation of Open Access Repositories (COAR) Next Gener-                  default DSpace installation.
ation Repositories are the following [23]: expose identifiers,
declare licenses at the level of resources, discover informa-              B. DESIGN
tion through browsing, and expose standardized metrics of                  Figure 3 shows the high-level design for SW001, its cases of
use (COAR, 2017). This section describes an in-detail view                 use, and its class diagram are illustrated in Figures 4 and 5,
of the SW001 service that was developed according to the                   detailed information about these classes and their attributes
cascade model is in the baseline, then an excerpt of the                   is described in [24]. By reusing the information of these
incremental model referred to small modules, fast tests of                 figures similar services can be constructed.
functionality, feedback, correction, release, or scaling is used
until the implementation stage is finished; the last stage refers
to maintenance.                                                            C. IMPLEMENTATION
                                                                           SW001 is a REST-type service, its implementation, and its
                                                                           feasibility is explained in a scenario that uses a test set of
A. BASIC REQUIREMENTS                                                      theses described using the Dublin Core metadata format, [25],
The main requirements for SW001, the proposed semantic                     the Java Server Pages (JSP) Interface of DSpace to export the
web service, are the following: a) store RDF tuples, b) query              following metadata: id, collection, author, accessioned, avail-
of semantic information, c) export data in open formats,                   able, issued, abstract, provenance, sponsorship, description,
and d) integrate data from OERs. The installation of a local               citation, URI, iso, publisher, subject, alternative title, title,
instance of the 6.2 version of DSpace that emulates the                    and type. These metadata are stored in a Comma Separated
technical requirements of the UPPue repository, the Dspace                 Values file.

171936                                                                                                                                  VOLUME 8, 2020
M. A. Medina Nieto et al.: Web Service to Retrieve and Semantically Enrich Datasets




                 FIGURE 4. Cases of use for SW001 web service.




                      FIGURE 5. Class diagram for SW001 web service.


   By using the Jena-Fuseki version 1.6.0 server and the                              to verify that the metadata from DSpace was corrected
RDFizer service, SW001 transforms metadata into RDF doc-                              included in the RDF documents. Data of these documents
uments that form a triplestore. On one hand, Jena-Fuseki is                           are semantically enriched and modeled as instances of
integrated with TBD, a component that supports a layer of                             Onto4UPPue ontology that represents documents, users,
persistent storage, queries, and transactions in RDF docu-                            and their relationships for OERs, its intended use is the
ments (Apache Software Foundations, 2020). On the other                               establishment of a common vocabulary that provides the
hand, RDFizer implements extraction, transformation, and                              foundations for the deployment of semantic web services.
load. The RDF document for each thesis is accessible using                            A review of the correct insertion of instances into the ontol-
a Uniform Resource Locator (URL) such as the following:                               ogy required of the version 5.2 of Protégé editor [27],
                                                                                      Figure 6 shows the instances of the test set and some data
 http://repositorio.uppuebla.edu.mx:8080
                                                                                      properties for a thesis identified as instance T24, note that
      /rdf/handle/123456789/100/ttl
                                                                                      instances are associated with DataPropertyAssertion prop-
 A software module in the Python language uses the                                    erties and thus they model DSpace metadata. The man-
RDFlib, library that works with XML and RDF files [26]                                ager of the OER at the UPPUE validated the insertion

VOLUME 8, 2020                                                                                                                               171937
                                                                   M. A. Medina Nieto et al.: Web Service to Retrieve and Semantically Enrich Datasets




                   FIGURE 6. Modeling theses as instances of Onto4UPPue ontology.




   FIGURE 7. Home page to access SW001 service.


process, the plug-ins for Hermit version 1.3.8 and Pellet               fact that the management of huge datasets will speed up the
version 2.3.0 were used for automatic verification of logical           response times, hence new solutions using parallel processing
consistency.                                                            will be required as well as different alternatives to manage
   The instances for theses are linked with other instances that        collections.
represent students, advisors, and steering committee mem-                  At present, SW001 service is accessible from a web site
bers by using the object properties of the Onto4UPPUE ontol-            composed of six pages in the Spanish language described as
ogy. As a result, the implementation of the SW001 service               follows:
offers benefits like the following ones: 1) this new seman-                • Home. The initial web page that includes a welcoming
tically enriched dataset is used in a web site that extends                   message and the goal of the service as is illustrated
the keyword-based search of DSpace and allows users to                        in Figure 7
retrieve information about the elaboration process of theses,              • Semantic search. The initial web page that includes a
2) the logical consistency of this dataset has been automat-                  welcoming message and the goal of the service as is
ically validated, the use of a common vocabulary reduces                      illustrated in Figure 8
the ambiguity of the Spanish language and 4) this dataset is               • Test UX. Displays a test that uses usability heuristics to
exported into OWL or JSON formats with XML syntax for                         allow users to evaluate the Semantic search page, see
further analysis. However, there is some drawback such as the                 Figure 10

171938                                                                                                                                 VOLUME 8, 2020
M. A. Medina Nieto et al.: Web Service to Retrieve and Semantically Enrich Datasets




       FIGURE 8. The web page for the semantic search of SW001 service.




    FIGURE 9. Display of a result set for a semantic search in tabular form.




   • Export data. Enables users to export the ontology with                              To add other relationships from Onto4UPPue ontology
     instances in JSON and OWL formats                                                developers need to implement simple modifications to the
  • Frequent questions. Shows frequent questions and                                  source code. In summary, the SW001 service exports and
     answers to support knowledge acquisition of the OERs                             enrich bibliographic metadata of the UPPue repository as
     domain                                                                           open linked data.
  • Contact. Contains contact information of developers                                  The technologies used to implement SW001 are illus-
  A personal computer and the Google Chrome web browser                               trated in Figure 12; some versions are 3.6 for Python,
were used to display Figures 7 to 10. Due to the web site was                         4.2.2 for RDFLib library (RDFlib team, 2013), and 2.4.15 for
designed to be responsive, this also can be accessed from a                           AdminLTE. The information of Onto4UPPue ontology
mobile device as is illustrated in Figure 11.                                         was extracted with xml.etree.ElementTree (available at

VOLUME 8, 2020                                                                                                                             171939
                                                                       M. A. Medina Nieto et al.: Web Service to Retrieve and Semantically Enrich Datasets




   FIGURE 10. Display of a UX test to evaluate the semantic search page.




                     FIGURE 11. Access to SW001 from a mobile device.




https://pypi.org/project/elementtree/) while the insertion of               IV. USABILITY HEURISTIC EVALUATION
new instances was supported by the OWLReady2 version                        An exploratory and self-management survey was conducted
0.19 [29], OWLReady2 is compatible with RDFLib and can                      to evaluate the usability of the web site that allows users to
be linked with Hermit reasoner. The Flask framework (avail-                 access the SW001 service, this survey adopted the heuristics
able at https://pypi.org/project/Flask/) was used to imple-                 Torres-Budiel template [30], the application of these heuris-
ment the Model-View-Controller design pattern whereas                       tics is reported in [31]–[33], they are the following: gener-
the dashboard for the front end is provided by AdminLTE                     alities, identity, and information, language and readability,
version 2.4.15.                                                             labels, the structure and browsing, structure of the web pages,
   Section IV presents the preliminary results of a survey to               searching, multimedia elements, help, accessibility, control,
evaluates the usability of the described web site.                          and feedback.



171940                                                                                                                                     VOLUME 8, 2020
M. A. Medina Nieto et al.: Web Service to Retrieve and Semantically Enrich Datasets




                             FIGURE 12. Technologies for the front-end and back-end of SW001 service.




             FIGURE 13. Participants of the usability testing.




   Each heuristic is associated with a set of questions, for                          included in the TestUX page according to the following Likert
example, the questions for the heuristic of language are the                          scale:
following [30]:                                                                          • It is not applicable (0)
   1) Does the web site is in the same language than the                                 • Strongly disagree (1)
       language of users?                                                                • Disagree (2)
   2) Are the language and wording concise and clear?                                    • Neither agree or disagree (3)
   3) Are the language and wording friendly, familiar and                                • Agree (4)
       close?                                                                            • Strongly agree (5)
   4) Each paragraph has an idea?                                                        Table 1 shows the average for each heuristic, note that
   Those heuristics were estimated with a purposive sample                            the heuristic with the maximum value is accessibility. The
of 16 undergraduate students from a group of 30 computer                              questions related to accessibility are the following [30]:
science students who participated in the survey, teachers,                               1) Do the images include the ‘alt’ attribute that describes
and managers, these students are frequent users of the                                       their content?
UPPue repository, 10 men and 6 women, they are between                                   2) Is the web site compatible with different browsers?
21 and 24 years old. Students accessed the web site by                                       Is it visible with different screen resolutions?
using personal computers during a face-to-face session of                                3) Can the users browse for all the web site without down-
about 30 minutes. Two teachers and two repository managers                                   loading any plug-in?
participated in the analysis of results. Students expressed a                            4) Does the weight of the web site has been controlled?
level of satisfaction for each heuristics using the questionary                          5) Can the web site be printed without any problems?

VOLUME 8, 2020                                                                                                                               171941
                                                                   M. A. Medina Nieto et al.: Web Service to Retrieve and Semantically Enrich Datasets




TABLE 1. Results for each usability heuristic.                          quent users of the UPPUE repository participated in the test-
                                                                        ing, the information of 11 usability heuristics was gathered,
                                                                        the results were presented by heuristic and they indicated that
                                                                        students reported a high level of satisfaction. After the testing,
                                                                        some support elements were added into the web site due to the
                                                                        help heuristic had gotten the minimum value.
                                                                           Since the web site, users can download the ontology
                                                                        in JSON and OWL formats, thus this can be reused by
                                                                        other semantic web applications. Furthermore, reasoning
                                                                        will enable intelligent application development and further
                                                                        exploitation of the generated datasets, for instance, to infer
   The average of the 11 heuristics was 4.02, this value is             implicit information.
very close to the agreed value of the Likert scale, as a con-              The paper provides the basis for the development of seman-
clusion, the students reported that they were very satisfied            tic services, this contributes to spread the benefits of open
with the usability of the web site. For the heuristic with the          access and the construction of semantically enriched data
lowest value, that is, help, a section for frequently asked             sets. The current work is focused on the integration of any
questions (FAQs) was added as well as links to the help                 type of document store in the repository to be considered as an
section were added after the usability test.                            ontology instance. As future work, we plan to design semantic
   Figure 13 displays photographic evidence of students dur-            web services addressed to obtain output indicators for authors
ing the usability test.                                                 that support decision-makers in the OER domain.

V. CONCLUSION                                                           REFERENCES
The paper outlines concepts related to OERs and described                [1] J. A. Merlo, Ecosistemas del Acceso Abierto. Salamanca, Spain: Ediciones
SW001, a web service that retrieves theses and extends the                   Universidad de Salamanca, 2018.
keyword based-search of the UPPUE repository taking into                 [2] C. Lagoze, H. Van de Sompel, M. Nelson, and S. S. Warner,
                                                                             ‘‘Implementation guidelines for the open archives initiative proto-
account the roles of advisors and steering committee mem-                    col for metadata harvesting,’’ Open Arch. Initiative, Ithaca, NY,
bers formally represented into the Onto4UPPUE ontology.                      USA, Tech. Rep. 2005/05/03T22:51:00Z, 2005. [Online]. Available:
The proposed REST-type service was designed and imple-                       http://www.openarchives.org/OAI/2.0/guidelines.htm
                                                                         [3] Jisc Open Access Services. (2020). OpenDOAR: Directory of Open Access
mented taking into account scalability, transparency, and                    Repositories. [Online]. Available: http://v2.sherpa.ac.uk/opendoar/
some technical requirements of COAR. By using a triple                   [4] H. Fari and S. Khan, Exchanging Data From Institutional Repositories to
                                                                             the Semantic Web. Riga, Latvia: Lap Lambert Academic Publishing, 2015.
store, SPARQL queries and technological tools such as the                [5] H. Farid, S. Khan, and M. Y. Javed, ‘‘DSont: DSpace to ontology trans-
RDF serialization module of DSpace platform, the service                     formation,’’ J. Inf. Sci., vol. 42, no. 2, pp. 179–199, Apr. 2016, doi: 10.
links the repository and the Onto4UPPue ontology in the                      1177/0165551515591406.
                                                                         [6] M. R. Beazley, ‘‘Eprints institutional repository software: A review,’’ Part-
sense that the more thesis and descriptive data are inserted                 nership, Can. J. Library Inf. Pract. Res., vol. 5, no. 2, pp. 1–6, Aug. 2018,
into the repository, the more instances are integrated into the              doi: 10.21083/partnership.v5i2.1234.
                                                                         [7] M. Smith, M. Barton, M. Bass, M. Branschofsky, G. McClellan, D. Stuve,
ontology.                                                                    R. Tansley, and J. H. Walker, ‘‘DSpace: An open source dynamic digital
   On one hand, the use of the ontology reduce ambiguity in                  repository,’’ D-Lib Mag., vol. 9, no. 1, pp. 87–97, 2003.
the interpretation of descriptive data, its vocabulary is shared         [8] Mexican Government. National Council of Science and Technol-
                                                                             ogy. (2020). National Repository. [Online]. Available: https://www.
between users and computers and represents a tool to verify                  repositorionacionalcti.mx
automatically inconsistencies in the data. On the other hand,            [9] M. A. Medina, J. A. Sánchez, O. Cervantes, R. C. Medina, J. De la Calleja,
                                                                             and A. Benitez, ‘‘Semantic representation of operative and domain knowl-
the ontology supports the generation of semantically enriched
                                                                             edge for institutional repositories,’’ Mexican Copyright Nat. Inst., Mexico
datasets that coexist with the repository.                                   city, Mexico, Tech. Rep. 03-2017-042511235500, 2017, pp. 1–9.
   The functionality of SW001 was verified through the                  [10] T. R. Gruber, ‘‘Toward principles for the design of ontologies used for
                                                                             knowledge sharing?’’ Int. J. Hum.-Comput. Stud., vol. 43, nos. 5–6,
implementation of a software module in charge of checking                    pp. 907–928, Nov. 1995, doi: 10.1006/ijhc.1995.1081.
that all exported metadata from DSpace were integrated into             [11] F. Gandon and G. Schreiber. (Feb. 2014). RDF 1.1 XML syntax: W3C
ontology instances, the Protégé editor was used to explore                   Recommendation. [Online]. Available: http://www.w3.org/TR/rdf-syntax-
                                                                             grammar/
information about theses. The logical consistency of the                [12] A. M. D. C. Moura, F. Porto, V. Vidal, R. P. Magalhaes, M. Maia,
ontology was checked by using reasoners.                                     M. Poltosi, and D. Palazzi, ‘‘A semantic integration approach to pub-
                                                                             lish and retrieve ecological data,’’ Int. J. Web Inf. Syst., vol. 11, no. 1,
   A web site was designed to access SW001 from web                          pp. 87–119, Apr. 2015, doi: 10.1108/IJWIS-08-2014-0028.
browsers that allow users to retrieve content in the Spanish            [13] K. Wickett, ‘‘A logic-based framework for collection/item metadata rela-
language by using relationships between steering committee                   tionships,’’ J. Documentation, vol. 74, no. 6, pp. 1175–1189, Oct. 2018,
                                                                             doi: 10.1108/JD-01-2018-0017.
members, however, the development methodology enables                   [14] B. A. Casas and G. H. Ceballos, ‘‘Integrating semi-structured informa-
developers to implement simple updates to the source code to                 tion using semantic technologies: An evaluation of tools and a case
integrate other ontological relationships and similar services.              study on university rankings data,’’ in Proc. 3rd Int. Conf. Data Manage.
                                                                             Technol. Appl. (DATA), M. Helfert, A. Holzinger, and C. Francalanci,
The TestUX web page was constructed for usability testing,                   Eds. Setúbal, Portugal: ScitePress, 2014, pp. 357–364, doi: 10.5220/
sixteen undergraduate students of computer science and fre-                  0005004203570364.

171942                                                                                                                                    VOLUME 8, 2020
M. A. Medina Nieto et al.: Web Service to Retrieve and Semantically Enrich Datasets




[15] N. Konstantinou, D.-E. Spanos, N. Houssos, and N. Mitrou, ‘‘Exposing             [36] P. Bartalos and M. Bielikova, ‘‘Semantic Web service composition
     scholarly information as linked open data: RDFizing DSpace contents,’’                framework based on parallel processing,’’ in Proc. IEEE Conf. Com-
     Electron. Library, vol. 32, no. 6, pp. 834–851, Nov. 2014, doi: 10.1108/EL-           merce Enterprise Comput., Vienna, Austria, Jul. 2009, pp. 495–498, doi:
     12-2012-0156.                                                                         10.1109/CEC.2009.27.
[16] G. Solomou and D. Koutsomitropoulos, ‘‘Towards an evaluation of seman-
     tic searching in digital repositories: A DSpace case-study,’’ Program,
     vol. 49, no. 1, pp. 63–90, Feb. 2015, doi: 10.1108/PROG-07-2013-0037.
[17] L. Ivanovic, D. Ivanovic, and S. B. Dimic, ‘‘Improving dissemination
     of human knowledge by exporting data from research information sys-
     tems,’’ in Proc. Manage., Knowl. Learn. Int. Conf., Portorož, Slovenia:
     ToKnowPress, 2015, pp. 711–718. [Online]. Available: https://ideas.repec.                                 MARÍA AUXILIO MEDINA NIETO received
     org/h/tkp/mklp14/711-718.html                                                                             the M.Sc. and Ph.D. degrees from the Universi-
[18] C. Skourlas, A. Tsolakidis, P. Belsis, D. Vassis, A. Kampouraki,                                          dad de las Americas Puebla, Mexico, in 2008.
     P. Kakoulidis, and G. A. Giannakopoulos, ‘‘Integration of institutional                                   She is currently a Researcher with the Post-
     repositories and e-learning platforms for supporting disabled students in                                 graduate Department, Polytechnic University of
     the higher education context,’’ Library Rev., vol. 65, no. 3, pp. 136–159,                                Puebla (UPPuebla), Puebla. She has higher pro-
     2016, doi: 10.1108/LR-08-2015-0088.                                                                       file recognition (PRODEP). Her research inter-
[19] G. A. P. García, ‘‘Aplicación de la Lógica de predicados para la Real-                                    ests include ontologies, semantic web, and open
     ización de inferencias en documentos de Pasantía descritos a Nivel                                        educational repositories. She has been a member
     Ontológico,’’ Ingenierías USBMed, vol. 7, no. 1, pp. 11–19, Jun. 2016,                                    with the National System of Researchers (SNI)
     doi: 10.21500/20275846.1802.
                                                                                                               Level 1 since 2019.
[20] D. A. Koutsomitropoulos, ‘‘Semantic annotation and harvesting of feder-
     ated scholarly data using ontologies,’’ Digit. Library Perspect., vol. 35,
     nos. 3–4, pp. 157–171, Nov. 2019, doi: 10.1108/DLP-12-2018-0038.
[21] T. Georgieva-Trifonova, K. Zdravkov, and D. Valcheva, ‘‘Application of
     semantic technologies in bibliographic databases: A literature review and
     classification,’’ Electron. Library, vol. 38, no. 1, pp. 113–137, Dec. 2019,
     doi: 10.1108/EL-03-2019-0081.                                                                               PAULO DANIEL VÁZQUEZ MORA received
[22] A. Sayed and A. Al Muqrishi, ‘‘CASONTO: An efficient and scalable                                           the M.Sc. degree from the Polytechnic University
     Arabic semantic search engine based on a domain specific ontology and                                       of Puebla (UPPuebla), Puebla, Mexico, in 2019.
     question answering,’’ Int. J. Web Inf. Syst., vol. 12, no. 2, pp. 242–262,                                  He has been a Web Developer (Experienced) with
     Jun. 2016, doi: 10.1108/IJWIS-12-2015-0047.                                                                 public and private sector since 2006. He is cur-
[23] COAR Building a Global Knowledge Commons. (2017). Next Generation                                           rently a part-time Professor with the Information
     Repositories: Behaviors and Technical Recommendations of the COAR
                                                                                                                 Technologies Department, Technological Univer-
     Next Generation Repositories Working Group. Accessed: Jul. 20, 2018.
                                                                                                                 sity of Puebla (UTP), Puebla. His research inter-
     [Online]. Available: https://www.coar-repositories.org/files/NGR-Final-
     Formatted-Report-cc.pdf                                                                                     ests include open access, information exchange,
[24] D. Vázquez, M. A. Medina, J. De la Calleja, A. Benitez, T. M. Vidal, and                                    web services, and open educational repositories.
     D. Alanís, ‘‘Diseño de un servicio Web para la recuperación de información       He received several certifications from the National Council of Normaliza-
     semántica del repositorio institucional de la Universidad Politécnica de         tion and the Certification of Working Skills.
     Puebla,’’ in Proc. 33rd Congreso de Instrumentación (SOMI), 2018, vol. 5,
     no. 1.
[25] Dublin Core Metadata Initiative. (2014). DCMI Metadata Terms. [Online].
     Available:      http://www.dublincore.org/specifications/dublin-core/dcmi-
     terms/
[26] RDFLib Team. (2020). RDFLib 5.0.0. [Online]. Available:
                                                                                                               JORGE DE LA CALLEJA MORA received the
     https://rdflib.readthedocs.io/en/stable/
                                                                                                               M.Sc. and Ph.D. degrees from the National
[27] M. A. Musen, ‘‘The protégé project: A look back and a look forward,’’ AI
                                                                                                               Institute of Astrophysics, Optics and Electronics
     Matters, vol. 1, no. 4, pp. 4–12, 2015, doi: 10.1145/2757001.2757003.
                                                                                                               (INAOE), Mexico. He has been a full-time Profes-
[28] E. Prud’Hommeauxd and A. Seaborne. (2007). SPARQL Query Language
     for RDF: W3C Recommendation 15 January 2008. [Online]. Available:                                         sor with the Computer Science Department, Poly-
     https://www.w3.org/TR/rdf-sparql-query/                                                                   technic University of Puebla (UPPuebla), Mexico,
[29] L. Jean-Baptiste. (2019). Owlready2 Version 0.24. Python Package Index.                                   since 2008. His research interests include machine
     [Online]. Available: https://pypi.org/project/Owlready2/                                                  learning, computer vision, and data mining with
[30] Torres Burriel Estudio. (2008). Test Heurístico De Torres Burriel. [Online].                              applications in medicine, education, and astron-
     Available:       http://www.torresburriel.com/weblog/2008/11/28/plantilla-                                omy. He is a member of the National System of
     para-hacer-analisis-heuristicos-de-usabilidad/                                                            Researchers (SNI) Level 1.
[31] G. E. C. Golondrino, D. P. Oliveros, and M. Y. C. Muñoz, ‘‘Automa-
     tion of usability inspections for websites,’’ in Human-Computer Interac-
     tion (Communications in Computer and Information Science), vol. 1114,
     H. P. Ruiz and V. Agredo-Delgado, Eds. Puebla, Mexico: Springer, 2019.
[32] O. D. Pérez, G. E. Chanchi, and M. I. Vidal, ‘‘Propuesta de un test
     heurístico de accesibilidad para sitios Web basados en la norma NTC
     5854,’’ Revista Ibérica de Sistemas e Tecnologias de Informação, vol. 17,
                                                                                                               MIREYA TOVAR VIDAL received the Ph.D.
     no. 1, pp. 170–182, 2019.                                                                                 degree in computer science from CENIDET,
[33] E. Serna, Desarrollo e innovación en ingeniería. Instituto Antioqueño de                                  Mexico, in 2015. She is currently a full-time
     investigación, 2018.                                                                                      Professor with the Faculty of Computer Science,
[34] P. F. Patel-Scheneider, P. Hayes, and I. Horrocks. OWL Web Ontology Lan-                                  Benemerita Universidad Autonoma de Puebla
     guage Semantics and Abstract Syntax. Accessed: Jun. 13, 2019. [Online].                                   (BUAP). Her research interests include ontology,
     Available: http://www.w3.org/TR/owl-semantics/                                                            clustering, information retrieval, computational
[35] A. Kut and D. Birant, ‘‘An approach for parallel execution of Web ser-                                    linguistics, logic, and natural language process-
     vices,’’ in Proc. IEEE Int. Conf. Web Services, Jul. 2004, pp. 812–813,                                   ing. She is a member of the National System of
     doi: 10.1109/ICWS.2004.1314831.                                                                           Researchers (SNI) Level 1 in 2016.

VOLUME 8, 2020                                                                                                                                            171943
                                                                                M. A. Medina Nieto et al.: Web Service to Retrieve and Semantically Enrich Datasets




                          EDUARDO LÓPEZ DOMÍNGUEZ (Member,                                                     ISMAEL EVERARDO BÁRCENAS PATIÑO
                          IEEE) received the Ph.D. degree from the National                                    received the Ph.D. degree in computer science
                          Institute of Astrophysics, Optics and Electron-                                      from the University of Grenoble. He is currently an
                          ics (INAOE), Mexico, in 2010. He is currently                                        Assistant Professor with the Computing Engineer-
                          a Researcher with the Department of Computer                                         ing Department, National University of Mexico.
                          Science, Laboratorio Nacional de Informática                                         His main research interests include the theory of
                          Avanzada (LANIA), Mexico. His research inter-                                        automated reasoning and its application in areas,
                          ests include mobile distributed systems, partial                                     such as knowledge representation and formal ver-
                          order algorithms, and multimedia synchroniza-                                        ification. He is a member of the National System
                          tion. He is a member of the National System of                                       of Researchers (SNI) Level 1.
                          Researchers (SNI) Level 1.


                        DELIA ARRIETA DÍAZ received the master’s
                        degree in quality of public management, the mas-
                        ter’s degree in gestalt therapy, and the Ph.D. degree
                        in government and public management. She is
                        currently a full-time Teacher of bachelor’s and
                        master’s degrees with the Faculty of Economics,
                        Accountability, and Management, Juarez Univer-
                        sity of the State of Durango (UJED). She has coor-
                        dinates the academic group called Management
                        and Development of Organizations. She has higher
profile recognition (PRODEP). She received the Teacher Certification from
the National Association of Faculties and Schools of Accountability and
Management (ANFECA).




171944                                                                                                                                              VOLUME 8, 2020