trwnh.com/unified.test.hugo/content/articles/evolving-orderedcollection-to-be-more-useful/index.md
2024-10-12 13:45:14 -05:00

325 lines
No EOL
21 KiB
Markdown

+++
title = "Evolving OrderedCollection to be more useful"
date = 2024-10-04T21:50:07-06:00
toc = true
autonumbering = true
[[params.inReplyTo]]
name = "Reordering user-created collections"
url = "https://github.com/w3c/activitypub/issues/439"
[[params.syndication]]
name = "Original post on SocialHub"
url = "https://socialhub.activitypub.rocks/t/pre-fep-evolving-orderedcollection-to-be-more-useful/4608"
+++
Continuing from (and expanding on) this GitHub issue: https://github.com/w3c/activitypub/issues/439
Recently I've been thinking about OrderedCollection again and how the way it's specified doesn't actually match up with the way it's defined, or with what you'd need for it to be a useful data type. Some issues I could identify are laid out below.
## Q1: Is it a List or a Set?
Put another way: can you Add an item into an OrderedCollection more than once?
### The argument for OrderedList
If you define a Collection in terms of `items`, then it is pretty clearly a Set, and in JSON-LD it uses the `@set` container (which is the default container for every term's value). It seems pretty straightforward to look at Add and Remove as set operations which align with e.g. Python's set.add() and set.remove() functions. (In other languages, these operations may be called "insert" and "delete".)
If you define OrderedCollection in terms of `orderedItems`, then this is simply a term which defines `as:items` with a `@container: @list`, so declaring `orderedItems` instead of `items` will use the JSON-LD `@list` container instead of using the JSON-LD `@set` container. One notable thing about the `@list` container is that it is not the same as an OrderedSet -- it can actually contain duplicates. In JSON-LD, this is clearly described as an "ordered list". For example:
```json
{"@list": [{"@value": 10.0}, {"@value": 10.0}]}
```
### The argument for OrderedSet
The problem with taking OrderedCollection to be defined in terms of `orderedItems` is that OrderedCollection in AS2-Vocab is actually defined as *inheriting from* Collection. The actual definitions given are the following:
> A `Collection` is a subtype of `Object` that represents ordered or unordered sets of `Object or Link` instances.
{cite="https://www.w3.org/TR/activitystreams-vocabulary/#dfn-collection" name="<cite>Activity Vocabulary</cite> § 2 \"Core Types\" # Collection"}
> A subtype of `Collection` in which members of the logical collection are assumed to always be strictly ordered.
{cite="https://www.w3.org/TR/activitystreams-vocabulary/#dfn-orderedcollection" name="<cite>Activity Vocabulary</cite> § 2 \"Core Types\" # OrderedCollection"}
So if Collection is a set (ordered or unordered), and OrderedCollection inherits from Collection, then OrderedCollection is logically implied to also be a set (ordered).
This is consistent with the use of Add and Remove to manipulate both Collection and OrderedCollection. But that leads into the next question...
## Q2: If it's an ordered set, then is it an OrderedSet or a SortedSet?
Put another way: does Adding an item into an OrderedCollection simply append at the end, or does it potentially cause the set to be reordered to match some kind of sorting?
To give a concrete example, consider how ActivityPub mandates that OrderedCollection MUST be reverse chronological. (This has been the subject of an errata intending to relax this requirement to only apply to the properties defined as OrderedCollection *within ActivityPub*, but the general problem still applies.) ActivityPub gives this clarifying note:
> What property is used to determine the reverse chronological order is intentionally left as an implementation detail. For example, many SQL-style databases use an incrementing integer as an identifier, which can be reasonably used for handling insertion order in most cases. In other databases, an insertion time timestamp may be preferred. What is used isn't important, but the ordering of elements must remain intact, with newer items first. A property which changes regularly, such a "last updated" timestamp, should not be used.
So this clearly points toward using something like "insertion time" or "incrementing integer", which means that an Add targeting an OrderedCollection will by default insert the new item at the end of the `@list`. Or the beginning, once it's reversed.
### So how do we determine ordering?
By default, the ordering is assumed "reverse chronological by time of insertion", although with the relaxed restriction from the errata, you are free to order in any other way you wish. But if this is the case, then it would be a good idea to hint or signal what that ordering is.
#### Prior art
Looking at prior art, we have `https://schema.org/ItemList` which provides a property `https://schema.org/itemListOrder` which in turn can be either freeform text, or it can be one of the following `https://schema.org/ItemListOrderType`
- `https://schema.org/ItemListOrderAscending`
- `https://schema.org/ItemListOrderDescending`
- `https://schema.org/ItemListUnordered`
The ordering happens based on each `https://schema.org/ListItem` having a `https://schema.org/position` property. You can navigate a doubly-linked list with `https://schema.org/nextItem` or `https://schema.org/previousItem` on each ListItem.
#### What can we do in AS2?
It doesn't immediately make sense to put `nextItem` and `previousItem` style links directly on the object, because an object may be part of multiple collections. I guess you could maybe inject those properties when rendering or presenting an OrderedCollection, but this wouldn't really make sense either. The ordering in an OrderedCollection is already handled by JSON-LD's `@list` container, which in plain JSON would be an ordered array. (With paging, you can follow `as:first`/`as:next`* or you can follow `as:last`/`as:prev`* to traverse the entire OrderedCollection, one OrderedCollectionPage at a time.)
So maybe we could define some property `orderType`, which gives a hint as to how the collection is ordered? The value of this property is left kind of open-ended, but it could be handled by vocabulary terms, if you just define the `orderType` term as `@type: @vocab`. This would allow you to explicitly define classes for things like `ReverseChronological`, `ForwardChronological`, or any other well-defined specific concept:
```json
{
"@context": [
"https://www.w3.org/ns/activitystreams",
{
"orderType": {
"@id": "https://w3id.org/fep/xxxx/orderType",
"@type": "@vocab"
},
"ReverseChronological": "https://w3id.org/fep/xxxx/ReverseChronological",
"ForwardChronological": "https://w3id.org/fep/xxxx/ForwardChronological"
}
],
"id": "https://domain.example/some-collection",
"type": "OrderedCollection",
"orderedItems": [
"https://domain.example/objects/3",
"https://domain.example/objects/2",
"https://domain.example/objects/1"
],
"orderType": "ReverseChronological"
}
```
This at least gives us one more option to be expressive; we are no longer assuming everything to be reverse chronological, but we can instead be explicit about things being forward chronological or reverse chronological.
### Maybe we need a SortedCollection type analogue to OrderedCollection?
It may be that Adding something into the collection *doesn't* simply append it at the end, but rather causes the entire collection to be re-sorted. For example, consider a collection representing a conversation which is ordered "forward chronologically", but in constructing the collection, we missed an item somewhere in the middle. Rather than start over entirely, or just living with having the missed item be appended out-of-order at the end, we might instead define a new type of collection to handle this use case. Whereas the `Collection` represents an UnorderedSet data type, and an `OrderedCollection` represents an OrderedSet data type, we can define `SortedCollection` to represent a SortedSet. It inherits from OrderedCollection, since a SortedSet is just an OrderedSet that reorders itself upon any new insertion. We can also define a property `sortType` which is an analogue of the `orderType` we defined above.
```json
{
"@context": [
"https://www.w3.org/ns/activitystreams",
{
"orderType": {
"@id": "https://w3id.org/fep/xxxx/orderType",
"@type": "@vocab"
},
"ReverseChronological": "https://w3id.org/fep/xxxx/ReverseChronological",
"ForwardChronological": "https://w3id.org/fep/xxxx/ForwardChronological",
"SortedCollection": "https://w3id.org/fep/xxxx/SortedCollection",
"sortType": {
"@id": "https://w3id.org/fep/xxxx/sortType",
"@type": "@vocab"
}
}
],
"id": "https://domain.example/some-collection",
"type": "SortedCollection",
"orderedItems": [
"https://domain.example/objects/4",
"https://domain.example/objects/2",
"https://domain.example/objects/1"
],
"sortType": "ReverseChronological"
}
```
But now we run into a caveat: it might not be sufficient to define classes/types for `sortType`, because the sorting might happen based on some property instead. So it looks like we're gonna need another mechanism...
Maybe something like `sortedBy` and `sortOrder`? These would both be `@type: @vocab`. The `sortedBy` would be a functional property that indicates which property is being used for sorting, for example, we could sort by `as:published` (although perhaps we shouldn't?); the `sortOrder` would take one of two type/class/vocab values: either `Ascending` or `Descending`. So our example object and context is starting to look like this:
```json
{
"@context": [
"https://www.w3.org/ns/activitystreams",
{
"orderType": {
"@id": "https://w3id.org/fep/xxxx/orderType",
"@type": "@vocab"
},
"ReverseChronological": "https://w3id.org/fep/xxxx/ReverseChronological",
"ForwardChronological": "https://w3id.org/fep/xxxx/ForwardChronological",
"SortedCollection": "https://w3id.org/fep/xxxx/SortedCollection",
"sortedBy": {
"@id": "https://w3id.org/fep/xxxx/sortedBy",
"@type": "@vocab"
},
"sortOrder": {
"@id": "https://w3id.org/fep/xxxx/sortOrder",
"@type": "@vocab"
},
"Ascending": "https://w3id.org/fep/xxxx/Ascending",
"Descending": "https://w3id.org/fep/xxxx/Descending"
}
],
"id": "https://domain.example/some-collection",
"type": "SortedCollection",
"orderedItems": [
"https://domain.example/objects/4",
"https://domain.example/objects/2",
"https://domain.example/objects/1"
],
"sortedBy": "published",
"sortOrder": "Descending"
}
```
Now finally, we can send an Add and know for sure that it will be inserted in the right index/position of the underlying `@list` container:
```json
{
"@context": "https://www.w3.org/ns/activitystreams",
"id": "https://domain.example/some-activity",
"actor": "https://domain.example/some-actor",
"type": "Add",
"object": "https://domain.example/objects/3",
"target": "https://domain.example/some-collection"
}
```
We now expect that the result will be `[4, 3, 2, 1]` instead of `[3,4,2,1]`.
We might also need a comparison function, but this is currently left unconsidered.
## Q3: Are there other ways to approach insertion into an OrderedCollection?
We might not *want* our OrderedSet to be a SortedSet specifically. We might want some specific ordering that is also specifically not sorted according to any criteria or property.
### Inserting a new item at a specific position
In the most basic case, this might be doable with a property like `insertAfter`? This would allow us to specify where in the `@list` to insert the new item.
Say we have `orderedItems: [4, 2, 1]`. We might formulate the following Add activity:
```json
{
"@context": [
"https://www.w3.org/ns/activitystreams",
{
"insertAfter": {
"@id": "https://w3id.org/fep/xxxx/insertAfter",
"@type": "@id"
}
}
],
"id": "https://domain.example/some-activity",
"actor": "https://domain.example/some-actor",
"type": "Add",
"object": "https://domain.example/objects/3",
"insertAfter": "https://domain.example/objects/4"
}
```
Did you spot the flaw in this? We have a problem: the receiving server might not understand the `insertAfter` property, and therefore might process the `Add` in a way that we didn't expect! We end up with `[3, 4, 2, 1]` because the `insertAfter: 4` wasn't understood, and so we didn't get the desired result of `[4, 3, 2, 1]`.
This can be fixed by changing the `Add` to an `Insert`. Now, the server has to specifically understand the Insert activity and its side effects, or else the activity will not be processed.
```json
{
"@context": [
"https://www.w3.org/ns/activitystreams",
{
"insertAfter": {
"@id": "https://w3id.org/fep/xxxx/insertAfter",
"@type": "@id"
},
"Insert": "https://w3id.org/fep/xxxx/Insert"
}
],
"id": "https://domain.example/some-activity",
"actor": "https://domain.example/some-actor",
"type": "Insert",
"object": "https://domain.example/objects/3",
"insertAfter": "https://domain.example/objects/4"
}
```
We could also define an `insertBefore` property, but I think this isn't necessary and overall just complicates things. Generally when updating a linked list, you traverse it going forwards; the `@list` of `orderedItems` is also presented in forward ordering. It therefore makes the implementation simpler at a data structure level to only define `insertAfter` and no `insertBefore`.
One remaining caveat has to do with ActivityPub delivery. In the case that we send an activity to our local server via C2S, we are not guaranteed that it will have any side effects processed; the activity may be malformed or otherwise not processable. But the presence of addressing properties like `to`, `cc` and `audience` will cause the activity to be delivered to the recipients regardless of whether it had any local side effects -- perhaps the remote server will know what to do with the activity. If the remote server also doesn't know how to process the activity, then you end up with a no-op on both your local server and on the remote server. Perhaps some client reading their `inbox` might be able to derive some meaning from it...?
#### Inserting multiple items? (WIP)
Maybe we could do `"object": {"@list": [...]}` but this might not be friendly to LD-unaware processors. Also this doesn't allow for inserting multiple items at different locations, but maybe those should be separate Insert activities/operations anyway.
### Moving an item to a new position in the list? (WIP)
Just like we defined an Insert activity earlier, we might want to use something like a Move activity to reorder our OrderedCollection. Say we have a list in the form `orderedItems: [3, 4, 2, 1]`, perhaps by some permutation of aware or unaware servers leading to the error case from before. Well, how do we fix this?
What we want in this case is to move `3` to be after `4`. (Alternatively, we can achieve the same result by moving `4` to come before `3`, but we already established that "insert before" is harder than "insert after". If we need to reconsider this, then it can be reconsidered.)
We might formulate "move `3` to be after `4`" with the following Move activity:
```json
{
"@context": [
"https://www.w3.org/ns/activitystreams",
{
"insertAfter": {
"@id": "https://w3id.org/fep/xxxx/insertAfter",
"@type": "@id"
},
"Insert": "https://w3id.org/fep/xxxx/Insert"
}
],
"id": "https://domain.example/some-activity",
"actor": "https://domain.example/some-actor",
"type": "Move",
"object": "https://domain.example/objects/3",
"origin": "https://domain.example/some-collection",
"target": "https://domain.example/some-collection",
"insertAfter": "https://domain.example/objects/4"
}
```
This `Move` activity is equivalent to doing a `Remove` first, followed by a new `Insert`; compare to a `Move` activity normally being equivalent to doing a `Remove` first and then following up with an `Add`.
But we run into the same problem as before -- namely, it's not clear whether `Move` in this case means "Remove from `origin` and Add to `target`", or if it means "Remove from `origin` and Insert into `target`". So we probably need a new activity type for this kind of operation. I have no idea what to call it.
For now, the safest thing is probably to fall back to doing this in 2 activities: first, send a `Remove`, then send an `Insert`.
### Completely reordering the items
Given a complex enough OrderedCollection, you might find it easier to just create a new collection and Add or Insert items as appropriate... but this is not an option if you want to preserve the collection's id. (Well, it could be an option, but you'd have to define a mechanism to reassign identifiers first. Perhaps a `Migrate` activity could be defined in some other FEP? Alternatively, `Move` might work, but I'd be wary of overloading the semantics of "moving" an object. Of course, Mastodon's "account migration" feature already uses `Move` in exactly this overloading way, so...)
So given the same OrderedCollection, we might consider an `Update` that changes the `orderedItems` around:
```json
{
"@context": "https://www.w3.org/ns/activitystreams",
"id": "https://domain.example/some-activity",
"actor": "https://domain.example/some-actor",
"type": "Update",
"object": {
"id": "https://domain.example/some-collection",
"orderedItems": [
"https://domain.example/objects/4",
"https://domain.example/objects/3",
"https://domain.example/objects/2",
"https://domain.example/objects/1",
]
}
}
```
In the case where an OrderedCollection is *unpaged* but just very very large, then the answer is to simply construct a similarly large Update activity. But the larger the collection grows, the more likely it is to be paged. So if the OrderedCollection is paged and the pages are reified, you might only be able to Update one page at a time. You might even find yourself needing to Move items between one reified page and another! So maybe you can construct a new OrderedCollectionPage and carefully update the `next` and `prev` links of what is effectively a linked list. The exact algorithm for doing this is left as an exercise for the reader, because this is already getting to be a very long post, and adding more examples for this increasingly niche and situation-dependent use case is going to make it even longer... eh, perhaps in a follow-up.
## Current summary and conclusions
- An OrderedCollection's items are expressed in JSON-LD as a `@list` but the data type is more correctly an ordered set, not an ordered list. This is because of the AS2-Vocab definitions. This means that items can only be included in a Collection at most once.
- Ordering is determined by time-of-insertion and can be signalled by something like a property `orderType` and some classes like `ForwardChronological` and `ReverseChronological` to indicate if the latest appended item will be the first item (LIFO/FILO) or if the latest appended item will be the last item (LILO/FIFO).
- We could define a new type `SortedCollection` specifically for collections that are not only ordered, but also sorted according to some criteria. Time-of-insertion doesn't matter here, as any newly inserted item will take its place within the sorting. We might define the `sortType` somehow using vocab terms again, but it is more useful to use `sortedBy` pointing to a property name, and `sortOrder` can be one of two classes, either `Ascending` (smallest first) or `Descending` (largest first). We might also need a comparison function, but this is currently left unconsidered.
- For cases where the collection is ordered but not sorted, we could define an `Insert` activity and an `insertAfter` property to point to the object after which the Insert's object will be inserted. Currently left unconsidered is the possibility of inserting at a specific index numerically, as well as the ability to `insertBefore`.
- Inserting multiple objects into an OrderedCollection also needs to be thought about.
- Moving items typically is equivalent to Remove then Add. We might need an activity that is equivalent to Remove then Insert, but I have no idea what to call it. For now, the safest thing is probably to fall back to doing this in 2 activities: first, send a `Remove`, then send an `Insert`.
- Items might be completely reordered with an Update targeting the orderedItems property but this can be kinda complicated when dealing with paged collections, especially if pages are reified. You will possibly have to construct new pages and manipulate the `next`/`prev` links between them.
- For now, we neglect to consider or define non-set types, such as List which is explicitly allowed to contain the same item multiple times.