The topics of the sessions of the third Forum are mostly inspired to the main messages issued by the FLaReNet Network in its "Blueprint of Actions and Infrastructures".
Before each session, the main outcomes and accomplishments of the project will be presented briefly.
Nowadays, existing resources very often are difficult to access for various reasons. There are a number of valuable and useful resources that are accessible, downloadable or purchasable from different sources and in different ways. Some are available through distribution centres (notably ELRA and LDC), others from portals of projects or associations, others directly from the web pages of the laboratories or researchers who developed the resource, others still on request from the owner. In the current state of affairs researchers must still consult multiple catalogues with different approaches, structures and terms, wasting time and sometimes failing to find relevant LRs. In many of such cases, unless the potential user already knows something about the resource he might want to use (its name, owner, project, etc.), it would be difficult to discover new or unknown resources. Enabling identification and discovery of “missing” resources is thus a priority in our field.
This session aims at stimulating the debate on new means for discovery and identification of LRs, and the development of simple mechanisms for searching/accessing information about resources.
Language resources must now confront with the current global trend towards making data freely available to everyone (Open Data), in particular since the technology is rapidly hitting the stage where language resource data can indeed be made widely available for collaborative work. This session intends to explore how the Open Data concept applies to language resources and its implications in terms of access, redistribution, reusability, and attribution, also considering what is going on in other disciplines.
The need of large-sized LRs with complex, high level and quality information encoded, for the advancement of the LRT field is undisputable. The high cost of their production both in terms of time and manpower hampers their creation. In order to reduce costs but also to encourage reusability of resources, it is important that resources be recycled as much as possible. In this session we aim at stimulating the discussion about proper management of the “life cycle” of language resource creation. We welcome contributions about practical experiences in re-use and repurposing of resources, both in terms of data and in terms of reuse of development methods, existing tools, use of translation/transliteration tools, etc.
High quality resources should be regarded as a key booster for the deployment of effective technology that impacts large sectors of activities (e-content, media, health, automotive, telecoms, etc.). Since language resources are costly, it is necessary to start preparing now the resources that will serve for the applications of the future and can positively impact on the development of multilingual technologies such as Machine Translation, cross-lingual and Web 3.0 applications. This session aims at highlighting the challenges (in particular, from an industrial point of view) that need to be overcome in order to stimulate the production of the large quantity of resources required, and at the same to ensure the necessary quality to get acceptable results in industrial environments.
Despite the vast amount of both academic and industrial investments, existing and available resources are not enough for satisfying the various needs of all different languages. Universal Linguistic Rights require the provision of language services for all people in their own mother tongue. Allocating funding to cover all languages (also the less-well represented languages of the world) and all basic needs of language technology remains a high priority for ensuring multilingual applications in the future, and therefore language resources for all (also less-resourced) languages must be developed. At the same time, it must be borne in mind that many undocumented languages that represent our cultural legacy may become extinct in the digital age.
This session aims at confronting current and emerging methodologies for efficient development of language resources for all languages, in particular the less resourced ones.
Long-term access to language resources must be a priority. Many resources are instead not available any longer or seem to have disappeared. Language resources can suffer of rapid obsolescence or even loss if issues of preservation and maintenance are not seriously taken into account. In this session we want to explore most appropriate means for data archiving and preservation, maintenance of LRs, and sustainability of linguistic tools and resources, e.g. by requesting accessibility and usability of resources for a given time frame.