Overview
Abstract
While the superiority of language models over tra ditional search-engines has become clear in recent years, their ‘generativity’ remains a serious concern for policymakers. On the input side, it is generally acknowledged that most large models are trained on unethically sourced data. And on the output side, their tendency to hallucinate and misinform makes them unfit for domain-specific work. This paper takes the view that explainability and transparency cannot be achieved simply by putting ‘a human in the loop’. Taking legal work as a concrete site, it proposes Hyvmind – an architecture that puts ‘humans in the centre’ by recording and rewarding semantic labour through tokenised annotations. Its novelty lies in conceptualising legal research as a set of four interconnected functions (source, watch, frame and curate) around a common data-object (source-text). By storing and rewarding annotative-work through a distributed ledger system with nested states, it creates a secure, ethical and organic pathway for generating high-quality datasets for the next generation of domain-specific language models.
Keywords
tokenised annotations, semantic labour, reciprocal produsage, nested state, legal research, ontological plurality, distributed ledger
Abbreviations
AGLI
Artifically Generated Legal Information
AVC
Annotation Value Chain
CC
Curation Contract
CID
Content Identifier
CNFT
Curation NFT
FNFT
Frame NFT
FT
Fungible Token
HDAO
Hyvmind’s Decentralised Autonomous Organisation
IAA
Inter Annotator Agreement
ILP
Intellectual Labour Practice
LC
Location Contract
LNFT
Link NFT
LO
Legal Ontologies
NFT
Non Fungible Token
NLP
Natural Language Processing
NNFT
Nested NFT
ST
Source Text
TNFT
Text NFT
UTS
Utility Token System
UVS
Unrealised Value Space
Last updated