FORMAS Research Center on Data and Natural Language
FORMAS aims to study, analyze and evaluate semantic-based approaches. Our main research areas are based on five major pillars on semantic-based area: Methods, Ontology, Information Extraction, Interoperability and Big Data. We deeply analyze each one of these approaches focusing on obtaining high levels of semantic and pragmatic comprehension. Visite our website for more informations.
Our Main Projects
-
CSIS method interoperates Syntactic, Semantic and Pragmatic into University Surveillance models providing image captioning for operators and systems.
-
DIGGER method uses LLMs to provide a QA for the CDC legal documents.
- Brazilian Consumer Protection Code: a methodology for a dataset to Question-Answer (QA) Models @PADAWAN2024
- [Towards a Corpus Methodology for LLMs in the Legal Domain](STIL 2025)
-
DptOIE method extract triples from Universal Dependencies (UD) format.
- DptOIE: a Portuguese open information extraction based on dependency analysis. @AIR JOURNAL
- [DPToie-Python]
-
PortNOIE is a new version of DPTOIE-Neural.
- PortNOIE: A Neural Framework for Open Information Extraction for the Portuguese Language @PROPOR2024
- Extração de Informação Aberta com LLM para a Língua Portuguesa @LINGUAMATICA
- Exploring Open Information Extraction for Portuguese Using Large Language Models @PROPOR2024
- Scaling and Adapting Large Language Models for Portuguese Open Information Extraction: A Comparative Study of Fine-Tuning and LoRA @BRACIS2024
-
PTOIE-Flair is a pt-br OpenIE model.
-
PragmaticOIE method uses a rule-based approach to extract facts in Portuguese in a first pragmatic level.
-
ImageCaptioningPT methods to generate image captioning in the Portuguese language.
- Towards Image Captioning for the Portuguese Language: Evaluation on a Translated Dataset @ICEIS2023
- ... @JBCS2025
-
ALiBWeb is a Web system to map Brazilian dialectology areas.
- ALiBWeb: estado da arte e perspectivas futuras @WORKINGPAPER