If you're seeing this post without reading the last, you'll probably want to go back and get the overview of the system we're discussing and why it's designed the way it is.
This article is dedicated to fleshing out the specifics of a single component in that system: the 'say'. In the chain of steps laid out in the overview, this is the second.
The first step in the chain we're discussing is an on-device system that accepts input from sensors and generates unique hashes or IDs for anything in the real world - the 'hasher'. A say is a service that maps those unique IDs to language - a sort of a tagger.
We're skipping over that first hashing step for now to focus on the say because almost every other piece of this system is incidental - just a scaffolding - whereas the say is at the heart of the design.
Much of the current thinking about a future AR centers exclusively on connecting digital things directly to real people, places, and things. That idea doesn’t scale, creates artificial scarcity, reinforces existing power structures, and creates digital landlords in the process. Rather, we should enable connecting digital things to language, and focus on enabling users to choose the descriptions of the world that best suit their needs and perspective.
In response to that, at the highest level, the purpose of the system I'm proposing is to allow anyone to attach any digital payload to any real person, place, thing, or idea, and for anyone else to have access to those payloads with no intermediary.
The topic of today's post - the say - provides a protocol through which anyone can attach language to the meaningless IDs the system gives to the world, such that those who have something to share about a real thing can attach digital stuff to that language in addition to the ID itself.
To break this down one more way - first we make a meaningless but unique identification: this platypus is "eae5bb3b-9b08-4079-a32d-e789ec2fe2d6". Any device in the world running the same hasher will pull up the same ID in the presence of the same thing. Then we ask a distributed system to describe, or provide tags for "eae5bb3b-9b08-4079-a32d-e789ec2fe2d6", so we can do something with that information.
Out comes a word cloud, and rather than being the same for every user worldwide (like the ID is), the language that comes out is instead determined by who the user trusts, what language they speak, and, in effect, what their values and beliefs are.
Trust and Semiosis
“So the Lord God formed of the earth every beast of the field, and every fowl of the heaven, and brought them unto the man to see how he would call them: for howsoever the man named the living creature, so was the name thereof.” (Gen. 2.19)
There was a period of time during which one could be tricked into thinking there was a mainstream consensus on reality. That time seems to be coming to an end, with some mixed consequences.
That consensus, though, consisted of a very small subset of the realm of possible ideas, and even during periods of widespread accord, we each, personally, carry subjective perspectives and opinions that may be rare, or even unique to us.
If you ask a hundred people to describe a given person, dog, building, tree, product, vehicle, artwork, shoe, a frog, etc., you will, for each item, get back a distribution of language - a word cloud - with lots of overlap, and likely some outliers as well. Some of those outliers you might reject as objectively wrong, but some might be a matter of opinion.1
To give an example, take the frog. Someone unfamiliar with frogs but generally afraid of them might describe that frog as 'huge' and 'slimy', whereas a herpetologist might describe the frog as 'small', and furthermore 'variegated', or 'aposomatic', 'toxic', 'sexually dimorphic', etc. It's entirely possible, especially in the case of things as varied and under-classified as frogs, that another herpetologist might disagree with one or more of the suggestions of the first.
Similarly one could imagine the exercise leading to some honest disagreement about, for example, the descriptors of land or territory, or about a person. Think about the words that might be used to describe the West Bank, or a controversial politician, and then compare that collection of descriptors to the language used on their Wikipedia pages.
I'm not trying to argue that knowledge-organizing projects like Wikipedia, Wikidata, the Semantic Web, corporate knowledge graphs, good old-fashioned maps, etc. represent wasted effort, but rather that they necessarily encode a viewpoint2, in most cases that of a compromise based on some moderation rules, on the nature of a thing, or, taken in the gestalt, on the nature of reality.
By their design, these efforts minimize contradiction and compartmentalize disagreements, creating an institutional perspective that, at a certain scale, takes on the likeness of fact, and confers that status to subjective statements contained therein. And even though projects like Wikipedia are available in a rainbow of languages, language itself encodes cultural perspectives.
When we begin connecting the digital world to the physical world, if we do so with a process where authorities, corporations, or institutions, however well-meaning, are exclusively responsible for naming and labeling the world, we run the risk of hegemonizing semiosis. As evidence of how untenable a single universal viewpoint is, Google has long ago given up on serving one map to the whole world and now shows different borders depending on who's asking.
This is not a matter of degree - something you can do better or worse - you can either make this mistake or avoid it entirely. On the internet today, anyone can provide a service that is topically about an idea, person, place, or thing, and those services, in form of apps, web pages, and protocols, anyone can find and reach via search, shared links, direct navigation, and so on.
I've already made the case that we shouldn't trust anyone with the ability to dictate what digital things can and can't be connected to the real world. Nor should we even contemplate a system wherein artificial limitations on how many digital things can occupy the same connection, or space. Those approaches create a new kind of property and bring landlords along into what is otherwise an unbounded new resource.
The most natural framework for mapping arbitrary data (ideas) onto the world's things and concepts is the one we already have - language, but rather than offer a top-down description of the world onto which we connect our digital information, whether crowdsourced or centrally-controlled, we should allow the descriptions themselves to be as open as the digital world they enable.
In helping machines interpret the world around them, users should be in control of whose language they employ to describe the world, and when.
The Say Cascade
In order to enable user choice in a scalable, practical way, we should build say systems with a cascading lookup in mind.
Systems that use the output from a say to perform searches for digital material could use a cascade or a hierarchy to resolve conflicts, contextually give preference to subject matter experts, keep secrets, consider provenance, and share knowledge.
Let’s think about retrieving some descriptors of a person. You see someone you recognize at a business conference, but you’re not sure how you know them. Your system hashes them, and reaches to your personal say with the resulting ID. Nothing comes up. The system falls back to the says of your friend network, looking for both friends-only and public mappings on this person’s ID, penned by folks from your personal network. Nothing there either. The system falls back to a say operated by your company, used to provide an employee directory, info on vendors, and a CRM. Nothing there either.
You use a tap-dancing say you use to keep abreast of tap-dancing-related info when you’re watching tap-dancing because you're into tap-dancing for some reason and maybe that's why you recognize them. Hm. This person is not a tap-dancer. Lastly, your search falls back to your chosen baseline say, a public collaborative service somewhat like Wikipedia, and finally it gets a hit. Your relationship is parasocial - this person is the CEO of Dell. You’ve never met them.
As an aside, if your reaction to this premise is "that sounds an awful lot like universal facial recognition," you'd be right. This might be burying the lede, but in a system like this, anyone would be able to write anything (legal) about anyone else, just like they can now, except instead of figuring out their name and googling them, you'd have access to that information simply by looking at them.
This, of course, raises major concerns about privacy and harassment, but ones that only differ from the current situation on the web in degree, not in kind. There's plenty to discuss there but for now we'll put a pin in it and focus on user choice, as it's the theme of the post, and home in on the canonical response, or 'choos' (pronounced like 'choose').
Choos: Canonical Responses
In the world of the web, trademarks effectively entitle their holders to related domain names for the purposes covered by the trademark (big gloss here - more accurate description if you care to read it). You go to coke.com, you get the Coca-Cola corporation. For people, it's not so simple.
I don't own noahnorman.com, nor do any of the other unfortunate Noah Normans out there, except, I presume, one, and that's horse shit and possibly the instigating event of this whole enterprise. My supervillain origin story, if you will.
Further, if you search my name, my web page of choice is not guaranteed to be the top result, and if it is, that's to the exclusion of the desired result for the other Noah Normans out there. None of us are even guaranteed to show up on the first page, or any page at all.
In AR, we've just discussed how you could go from real-world things and people to language, which of course doesn't solve this issue for me. But a hash made from me - my appearance and context, translated into a unique identifier, I could potentially 'claim'.
We'll go into ideas of how this claiming system might work soon, but what this idea unlocks is the ability for individuals, trademark holders, even landowners, to offer 'canonical' augments attached to their hashes: a choos.
As an individual I might want to offer a vCard or an application to those in my presence - something like a blog or personal webpage or a storefront. I might want to extend my appearance with AR clothing. I might want to decorate the interior of my personal space, or provide decor and point of sale functionality in a restaurant I operate. I might want to virtually extend the exterior architecture of a building I own.
As a viewer, I may be interested to see the choos of a person I'm talking with, or of an entire crowd of people at once, or I may not. I may want to see architectural augments in some contexts and not others.
The possibility to offer a canonical augment doesn't preclude the ability of others to attach augments to the same hash directly, or to language that uniquely identifies a thing - it just allows, for the first time, everybody to have direct control over their preferred extension into digital space as others see it.
This could, for many people, be the first and only time they have a digital presence, but its widespread use could change the very nature of in-person human interaction, commerce, and architecture.
I'm starting to believe a personal choos - one connected to the ID produced from your physical appearance, could be the instigating killer app / controversy flashpoint / fomo engine this system needs to get the conversation started about what it means to directly connect the digital world to the physical, and why we should care about how that's done.
If that sounds interesting or scary or both to you, if it sounds like something you'd like to help make real or stop from happening, or if you know someone I should talk to about how to do it and how to get it right, please reach out, either in the comments or via email.
Next post I'll be looking at the first step in the chain - segmentation and hashing - in more detail.
I tried to find studies about the statistical distribution of words used by people asked to describe things. Came up empty-handed. If you know of any, I’d love to see them.
In the case of Wikipedia, the viewpoint of an editorial cohort that is 80% male, and in the USA, 75% white. https://www.nytimes.com/2023/09/10/podcasts/the-daily/wikipedia-ai.html