Publications scientifiques
-
Félix Barzin, Gonzague Yernaux and Wim Vanhoof. 2023. Scrimmo: A Real-Time Web Scraper Monitoring the Belgian Real Estate Market, 2023 IEEE International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), Venice, Italy, 2023, pp. 335-338.
Web scraping (or Web crawling), a technique for automated data extraction from websites, has emerged as a valuable tool for scientific research and data analysis. This paper presents a comprehensive exploration of Web scraping, its methodologies and challenges. The discussion revolves around a concrete application, namely the automatic extraction of data concerning the Belgian real estate market. We introduce a real-time Web scraper called SCRIMMO and tailored to collect data from websites containing real estate classified ads. The tool is developed in a continuous iterative process and based on an innovative cloud architecture. The paper also briefly addresses the ethical aspects of Web scraping. By integrating insights from previous research and ethical guidelines, this study provides researchers with a comprehensive understanding of Web scraping and its potential benefits, while promoting responsible and ethical practices in data collection and analysis.
-
Gonzague Yernaux & Wim Vanhoof. 2023. Predicate Anti-unification in (Constraint) Logic Programming. In: Glück, R., Kafle, B. (eds) Logic-Based Program Synthesis and Transformation. LOPSTR 2023. Lecture Notes in Computer Science, vol 14330. Springer, Cham.
In this work, which is done in the context of a (moded) logic programming language, we devise a data-flow analysis dedicated to computing what we call argument profiles. Such a profile essentially describes, for each argument of a predicate, its functionality, i.e. the operations in which the argument can be involved during an evaluation of the predicate, as well as how the argument contributes to the consumption and/or construction of data values. While the computed argument profiles can be useful for applications in the context of program understanding (as each profile essentially provides a way to better understand the role of the argument), they more importantly provide a way to discern between arguments in a manner that is more fine-grained than what can be done with other abstract characterizations such as types and modes. This is important for applications where one needs to identify correspondences between the arguments of two or more different predicates that need to be compared, such as during clone detection. Moreover, since a total order can be defined on the abstract domain of profiles, our analysis can be used for rearranging predicate arguments and order them according to their functionality, constituting as such an essential ingredient for predicate normalization techniques.
-
Gonzague Yernaux & Wim Vanhoof. 2023. A Dataflow Analysis for Comparing and Reordering Predicate Arguments. In Proceedings of the 39th International Conference on Logic Programming. EPCTS, pages 41-54.
In this work, which is done in the context of a (moded) logic programming language, we devise a data-flow analysis dedicated to computing what we call argument profiles. Such a profile essentially describes, for each argument of a predicate, its functionality, i.e. the operations in which the argument can be involved during an evaluation of the predicate, as well as how the argument contributes to the consumption and/or construction of data values. While the computed argument profiles can be useful for applications in the context of program understanding (as each profile essentially provides a way to better understand the role of the argument), they more importantly provide a way to discern between arguments in a manner that is more fine-grained than what can be done with other abstract characterizations such as types and modes. This is important for applications where one needs to identify correspondences between the arguments of two or more different predicates that need to be compared, such as during clone detection. Moreover, since a total order can be defined on the abstract domain of profiles, our analysis can be used for rearranging predicate arguments and order them according to their functionality, constituting as such an essential ingredient for predicate normalization techniques.
-
Rudy Kabimbi Ngoy, Gonzague Yernaux & Wim Vanhoof. 2023. EvscApp: Evaluating the Pedagogical Relevance of Educational Escape Games for Computer Science. In Proceedings of the 15th International Conference on Computer Supported Education - Volume 2: CSEDU; ISBN 978-989-758-641-5; ISSN 2184-5026, SciTePress, pages 241-251.
While there is consensus that educational escape games have a beneficial impact on student learning in computer science, this hypothesis is not empirically demonstrated because the evaluation methods used by researchers in the field are carried out in an ad hoc manner, lack reproducibility and often rely on confidential samples. We introduce EvscApp, a standard methodology for evaluating educational escape games intended for the learning of computer science at the undergraduate level. Based on a state of the art in the realm of educational escape games and on the different associated pedagogical approaches existing in the literature, we arrive at a general-purpose experimental process divided in fifteen steps. The evaluation criteria used for assessing an escape game’s efficiency concern the aspects of motivation, user experience and learning. The EvscApp methodology has been implemented as an open source Web dashboard that helps researchers to carry out structured experimentations o f educational escape games designed to teach computer science. The tool allows designers of educational computer escape games to escape the ad hoc construction of evaluation methods while gaining in methodological rigor and comparability. All the results collected through the experiments carried out with EvscApp are scheduled to be compiled in order to be able to rule empirically as to the pedagogical effectiveness of pedagogical escape games for computer science in general. A few preliminary experiments indicate positive early results of the method.
- Gonzague Yernaux & Wim Vanhoof. 2022. On Detecting Semantic Clones in Constraint Logic Programs. In IEEE 16th International Workshop on Software Clones (IWSC), Limassol, Cyprus, pp. 32-38.
Deciding whether two code fragments are semantic clones, or type-4 clones, is a problem with many ramifications. Current research often focuses on the problem in an imperative or object-oriented setting and most existing work uses abstract syntax trees, program dependency graphs, program metrics or text-based, token-based and machine learning-based approaches to identify semantic clones. In this work, we adopt a fundamentally different point of view and express clone detection as a search problem in a logic programming setting. Due to their restricted syntax and semantics, (constraint) logic programs are by nature simple and elegant candidates for automated analysis. After having formalized the clone detection problem at the level of predicates, we develop a study of the different parameters that come into play in the resulting framework. We try and identify the complexity issues involved in a general semantic clone detection procedure that essentially computes so-called most specific generalizations for predicates written in constraint logic programming (CLP). Even though well-known for basic structures such as literals and terms, generalization (or anti-unification) of more complex structures such as clauses and predicates has received very little attention. We show that the anti-unification allows both to control the search and guide the detection of cloned predicates. We pinpoint where efficient approximations are needed in order to be able to identify semantic code clones in a manageable time frame.
- Gonzague Yernaux & Wim Vanhoof. 2022. Anti-Unification of Unordered Goals. In 30th EACSL Annual Conference on Computer Science Logic (CSL 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 216, pp. 37:1-37:17, Schloss Dagstuhl - Leibniz-Zentrum für Informatik.
Anti-unification in logic programming refers to the process of capturing common syntactic structure among given goals, computing a single new goal that is more general called a generalization of the given goals. Finding an arbitrary common generalization for two goals is trivial, but looking for those common generalizations that are either as large as possible (called largest common generalizations) or as specific as possible (called most specific generalizations) is a non-trivial optimization problem, in particular when goals are considered to be unordered sets of atoms. In this work we provide an in-depth study of the problem by defining two different generalization relations. We formulate a characterization of what constitutes a most specific generalization in both settings. While these generalizations can be computed in polynomial time, we show that when the number of variables in the generalization needs to be minimized, the problem becomes NP-hard. We subsequently revisit an abstraction of the largest common generalization when anti-unification is based on injective variable renamings, and prove that it can be computed in polynomially bounded time.
- Gonzague Yernaux, Wim Vanhoof & Laurent Schumacher. 2020. Moulinog: A Generator of Random Student Assignments Written in Prolog. In Proceedings of the 22nd International Symposium on Principles and Practice of Declarative Programming (PPDP '20). Association for Computing Machinery, New York, NY, USA, Article 15, 1–5.
We introduce, describe and discuss the potentialities of Moulinog, a tool created during the COVID-19 lockdown, designed to generate individual questionnaires for the remote evaluation of large classrooms. Starting with a list of students and a series of predicates constituting a pool of parametric questions along with rules for their parametrization, Mouling generates a list of individual questionnaires, together with a shell script allowing an easy emailing of the (password-protected) questionnaires to the students. The tool’s use in practice is illustrated on a particular course case for which it has proven to be both useful and time-saving.
- Wim Vanhoof & Gonzague Yernaux. 2020. Generalization-Driven Semantic Clone Detection in CLP. In: Gabbrielli, M. (eds) Logic-Based Program Synthesis and Transformation. LOPSTR 2019. Lecture Notes in Computer Science (LNCS), vol 12042. Springer, Cham.
In this work we provide an algorithm capable of searching for semantic clones in CLP program code. Two code fragments are considered semantically cloned (at least to some extent) when they can both be transformed into a single code fragment thus representing the functionality that is shared between the fragments. While the framework of what constitutes such semantic clones has been established before, it is parametrized by a set of admissible program transformations and no algorithm exists that effectively performs the search with a concrete set of allowed transformations. In this work we use the well-known unfolding and slicing transformations to establish such an algorithm, and we show how the generalization of CLP goals can be a driving factor both for controlling the search process (i.e. keeping it finite) as for guiding the search (i.e. choosing what transformation(s) to apply at what moment).
- Gonzague Yernaux & Wim Vanhoof. 2019. Anti-unification in Constraint Logic Programming. Theory and Practice of Logic Programming. 2019;19(5-6):773-789.
Anti-unification refers to the process of generalizing two (or more) goals into a single, more general, goal that captures some of the structure that is common to all initial goals. In general one is typically interested in computing what is often called a most specific generalization, that is a generalization that captures a maximal amount of shared structure. In this work we address the problem of anti-unification in CLP, where goals can be seen as unordered sets of atoms and/or constraints. We show that while the concept of a most specific generalization can easily be defined in this context, computing it becomes an NP-complete problem. We subsequently introduce a generalization algorithm that computes a well-defined abstraction whose computation can be bound to a polynomial execution time. Initial experiments show that even a naive implementation of our algorithm produces acceptable generalizations in an efficient way.
Voir mon profil sur le site
de l'Université de Namur