wiki.trwnh.com/content/_dump/socialhub-threads/dereferencing-non-https.md
2023-04-04 18:25:35 -05:00

9.5 KiB
Raw Blame History

+++ title = "Dereferencing non-HTTPS URIs as id" date = "2019-10-11" +++

{{toc}}

Preserved text

https://socialhub.activitypub.rocks/t/dereferencing-non-https-uris-as-id/116


@trwnh:

3.1 Object Identifiers

All Objects in [ActivityStreams] should have unique global identifiers. ActivityPub extends this requirement; all objects distributed by the ActivityPub protocol MUST have unique global identifiers, unless they are intentionally transient […] These identifiers must fall into one of the following groups:

  1. Publicly dereferencable URIs, such as HTTPS URIs, with their authority belonging to that of their originating server. (Publicly facing content SHOULD use HTTPS URIs).
  2. An ID explicitly specified as the JSON null object, which implies an anonymous object (a part of its parent context)

Identifiers MUST be provided for activities posted in server to server communication, unless the activity is intentionally transient. However, for client to server communication, a server receiving an object posted to the outbox with no specified id SHOULD allocate an object ID in the actors namespace and attach it to the posted object.

All objects have the following properties:

id

The objects unique global identifier (unless the object is transient, in which case the id MAY be omitted).

type

The type of the object.

One thing I've wondered about is point 1 under 3.1 -- the id MUST be publicly dereferencable, with its authority belonging to the origin server... but it only SHOULD be https.

So, does this mean that there are other possible choices for a publicly dereferencable id with proper authority, that isn't HTTPS? I assume that the intention was to suggest using HTTPS over HTTP (e.g. section 3.2 assumes HTTP GET and content negotiation, as well as headers and 403/404 error codes; section 5.1/5.2 specifies HTTP POST to outbox/inbox; Section 6 uses MUST for making an HTTP POST to actor outboxes and 201 codes), but my curiosity is with other URI schemes entirely.

Perhaps one practical consideration (as there are various impractical ones, such as file://, ftp://, ftps://, sftp://, and so on) is to not use a URL but instead use a URN (such as doi, isbn, and other urn: URIs). Of course, this requires us to do a little more work to treat them as "dereferencable", such as by including a HTTPS proxy or some other resolver service. I wonder how that might be done, and whether this is at all worth pursuing. It could be used for deduplication, for example, by assigning a network-wide URN, with the authority deferred to the lookup service/proxy used as an instrument.


@cwebber:

So, does this mean that there are other possible choices for a publicly dereferencable id with proper authority, that isnt HTTPS?

Yes and the Golem demo shows exactly an example of this, using a (very-cut-down demo) of using Datashards with ActivityPub (it was called “magenc” back then) to distribute activities. A similar example could be done with IPFS, for instance (though that doesnt provide the privacy/security properties that Datashards does).

Similarly, bearcaps are a possible non-https URI scheme that we might use.

[...] Wanting to support Tor Onion Services is a reason I explicitly pushed back against pressure to make the spec https-only, which some people wanted.


@trwnh:

I guess what Im trying to comprehend the most would be, how would compatibility work between the HTTPS linked-data web, and the non-HTTPS documents? If we start passing around AP JSON-LD documents with non-HTTPS id then it would obviously break due to basically all implementations assuming that all they need to do is HTTP GET id.

I remember reading the RWoT paper about DIDs in ActivityPub and having the same confusion at the time about how it would work practically. The closest thing I could find was instrument in the AS2 Vocab, but that has a domain of Activity only, and is described as "Identifies one or more objects used (or to be used) in the completion of an Activity", so it would have to be redefined to work on Object as well.


@rinpatch:

Wouldn't an array of id work?


@trwnh:

Actually, multiple id seems interesting to me for a different thing: nomadic identity. Consider a case where an Object has multiple HTTPS URIs.

That might not be the best way to do it, though. In fact, one shortcoming that is immediately apparent to me is the question of what to do if the document retrieved from one id is different from that retrieved from another id. So it would be better to have one canonical id only, where possible. But technically, the problem of differing documents can still happen, due to something as simple as a desync.

[[future errata: this is what alsoKnownAs could be used for?]]

Id really like to focus on modeling how to dereference non-HTTPS in a way that is still roughly compatible with the HTTPS Web-based network. So far the following options have been mentioned:

  • id as Array; pick whichever one you understand.
  • url as Array; pick whichever one you understand but leave id as HTTPS.
  • just break compat and let implementations figure out how to resolve id (if they can)
  • use a local proxy directly as id (technically still globally unique but represents change of authority)
  • use a proxy Service as instrument (and extend instrument to be applied to Object and not just Activity)

@how:

Im a bit concerned about extending to objects what could be done with creating an object-specifc actor acting as a proxy. The instrument service seems to be adapted to this use-case: you get a representation of the objects metadata but must use the out of band service to retrieve the actual object. Best of both worlds?


@nightpool:

The ActivityPub standard does not define that any given server must support any one scheme. I dont understand what the purpose of having it do so would be? it would just limit the flexibility of the protocol for no practical benefit (If two servers dont support the same schemes, then theyre obviously not going to be able to talk to each other, MUST or no MUST)


@trwnh:

I think theres probably mixing of concerns to treat non-HTTPS as primarily an issue of nomadic content; while there is potential application toward this, I think the incompatibility issue is probably more pressing in the long run. You can claim that a software is “compatible with ActivityPub”, but this is a statement that needs clarification and qualification. And this is also something that would become even more of an issue if ActivityPub is implemented via networks other than HTTP(S).

In that sense, there is no “ActivityPub network” there is only currently an “ActivityPub over HTTPS network” (using HTTPS ids), as well as a sub-network of “ActivityPub over HTTPS network, with WebFinger” (perhaps more colloquially referred to as “the Mastodon network”).

Even if every (or the majority of) software implementation(s) moves to adopt bear: URIs as id, there is now still the question of which URI schemes are supported with the u parameter of the bearcap. The draft for bearcaps defines u as “The stable URL that the bearer capability URI is associated with for requests”, which is, again, not specifically HTTP(S). But even then, specifying a “stable URL” means that we are precluding URNs as a form of reference.

As nightpool mentions:

If two servers dont support the same schemes, then theyre obviously not going to be able to talk to each other

So I guess my main concern is that so-called “ActivityPub” software will settle around the lowest common denominator or the most restrictive set of requirements for interoperability. Id like to lessen that if possible, so that individual implementations have the option to explore alternative lookup strategies for other “publicly dereferencable” schemes without that implying a choice between “hard break in network compatibility” or “stuck using HTTPS for id because everyone else is”. Although a hard break might be desirable in some cases, e.g. “if you dont understand bearcaps then you cant interact with this object”.


current thoughts

we've well and truly collapsed into the "https only" world by now. any attempt to use a different scheme represents a sort of netsplit. this is not all good or bad though; it just means that the only network we have is the World Wide Web.

alsoKnownAs may be useful here as well: https://socialhub.activitypub.rocks/t/defining-alsoknownas/907/30

i see a few possible approaches:

  1. defer alsoKnownAs to mastodon's usage (concept 2, controlling the listed actor). define a new term aliases (pending name) to represent concept 1, different identifiers for the same subject. we might also define subject or some similar term to identify a "canonical" identifier?

  2. defer alsoKnownAs to the DID core definition (concept 1, different identifiers for the same subject). implement a transitional period in which mastodon and other projects switch to a different mechanism more along the lines of rel-me. eventually deprecate the use of alsoKnownAs for determining rel-me.

aside from that, the instrument idea still has some potential