21 KiB
+++ title = "Evolving OrderedCollection to be more useful" date = 2024-10-04T21:50:07-06:00 toc = true autonumbering = true params.inReplyTo name = "Reordering user-created collections" url = "https://github.com/w3c/activitypub/issues/439" params.syndication name = "Original post on SocialHub" url = "https://socialhub.activitypub.rocks/t/pre-fep-evolving-orderedcollection-to-be-more-useful/4608" +++
Continuing from (and expanding on) this GitHub issue: https://github.com/w3c/activitypub/issues/439
Recently I've been thinking about OrderedCollection again and how the way it's specified doesn't actually match up with the way it's defined, or with what you'd need for it to be a useful data type. Some issues I could identify are laid out below.
Q1: Is it a List or a Set?
Put another way: can you Add an item into an OrderedCollection more than once?
The argument for OrderedList
If you define a Collection in terms of items
, then it is pretty clearly a Set, and in JSON-LD it uses the @set
container (which is the default container for every term's value). It seems pretty straightforward to look at Add and Remove as set operations which align with e.g. Python's set.add() and set.remove() functions. (In other languages, these operations may be called "insert" and "delete".)
If you define OrderedCollection in terms of orderedItems
, then this is simply a term which defines as:items
with a @container: @list
, so declaring orderedItems
instead of items
will use the JSON-LD @list
container instead of using the JSON-LD @set
container. One notable thing about the @list
container is that it is not the same as an OrderedSet -- it can actually contain duplicates. In JSON-LD, this is clearly described as an "ordered list". For example:
{"@list": [{"@value": 10.0}, {"@value": 10.0}]}
The argument for OrderedSet
The problem with taking OrderedCollection to be defined in terms of orderedItems
is that OrderedCollection in AS2-Vocab is actually defined as inheriting from Collection. The actual definitions given are the following:
A
Collection
is a subtype ofObject
that represents ordered or unordered sets ofObject or Link
instances. {cite="https://www.w3.org/TR/activitystreams-vocabulary/#dfn-collection" name="Activity Vocabulary § 2 "Core Types" # Collection"}
A subtype of
Collection
in which members of the logical collection are assumed to always be strictly ordered. {cite="https://www.w3.org/TR/activitystreams-vocabulary/#dfn-orderedcollection" name="Activity Vocabulary § 2 "Core Types" # OrderedCollection"}
So if Collection is a set (ordered or unordered), and OrderedCollection inherits from Collection, then OrderedCollection is logically implied to also be a set (ordered).
This is consistent with the use of Add and Remove to manipulate both Collection and OrderedCollection. But that leads into the next question...
Q2: If it's an ordered set, then is it an OrderedSet or a SortedSet?
Put another way: does Adding an item into an OrderedCollection simply append at the end, or does it potentially cause the set to be reordered to match some kind of sorting?
To give a concrete example, consider how ActivityPub mandates that OrderedCollection MUST be reverse chronological. (This has been the subject of an errata intending to relax this requirement to only apply to the properties defined as OrderedCollection within ActivityPub, but the general problem still applies.) ActivityPub gives this clarifying note:
What property is used to determine the reverse chronological order is intentionally left as an implementation detail. For example, many SQL-style databases use an incrementing integer as an identifier, which can be reasonably used for handling insertion order in most cases. In other databases, an insertion time timestamp may be preferred. What is used isn't important, but the ordering of elements must remain intact, with newer items first. A property which changes regularly, such a "last updated" timestamp, should not be used.
So this clearly points toward using something like "insertion time" or "incrementing integer", which means that an Add targeting an OrderedCollection will by default insert the new item at the end of the @list
. Or the beginning, once it's reversed.
So how do we determine ordering?
By default, the ordering is assumed "reverse chronological by time of insertion", although with the relaxed restriction from the errata, you are free to order in any other way you wish. But if this is the case, then it would be a good idea to hint or signal what that ordering is.
Prior art
Looking at prior art, we have https://schema.org/ItemList
which provides a property https://schema.org/itemListOrder
which in turn can be either freeform text, or it can be one of the following https://schema.org/ItemListOrderType
https://schema.org/ItemListOrderAscending
https://schema.org/ItemListOrderDescending
https://schema.org/ItemListUnordered
The ordering happens based on each https://schema.org/ListItem
having a https://schema.org/position
property. You can navigate a doubly-linked list with https://schema.org/nextItem
or https://schema.org/previousItem
on each ListItem.
What can we do in AS2?
It doesn't immediately make sense to put nextItem
and previousItem
style links directly on the object, because an object may be part of multiple collections. I guess you could maybe inject those properties when rendering or presenting an OrderedCollection, but this wouldn't really make sense either. The ordering in an OrderedCollection is already handled by JSON-LD's @list
container, which in plain JSON would be an ordered array. (With paging, you can follow as:first
/as:next
* or you can follow as:last
/as:prev
* to traverse the entire OrderedCollection, one OrderedCollectionPage at a time.)
So maybe we could define some property orderType
, which gives a hint as to how the collection is ordered? The value of this property is left kind of open-ended, but it could be handled by vocabulary terms, if you just define the orderType
term as @type: @vocab
. This would allow you to explicitly define classes for things like ReverseChronological
, ForwardChronological
, or any other well-defined specific concept:
{
"@context": [
"https://www.w3.org/ns/activitystreams",
{
"orderType": {
"@id": "https://w3id.org/fep/xxxx/orderType",
"@type": "@vocab"
},
"ReverseChronological": "https://w3id.org/fep/xxxx/ReverseChronological",
"ForwardChronological": "https://w3id.org/fep/xxxx/ForwardChronological"
}
],
"id": "https://domain.example/some-collection",
"type": "OrderedCollection",
"orderedItems": [
"https://domain.example/objects/3",
"https://domain.example/objects/2",
"https://domain.example/objects/1"
],
"orderType": "ReverseChronological"
}
This at least gives us one more option to be expressive; we are no longer assuming everything to be reverse chronological, but we can instead be explicit about things being forward chronological or reverse chronological.
Maybe we need a SortedCollection type analogue to OrderedCollection?
It may be that Adding something into the collection doesn't simply append it at the end, but rather causes the entire collection to be re-sorted. For example, consider a collection representing a conversation which is ordered "forward chronologically", but in constructing the collection, we missed an item somewhere in the middle. Rather than start over entirely, or just living with having the missed item be appended out-of-order at the end, we might instead define a new type of collection to handle this use case. Whereas the Collection
represents an UnorderedSet data type, and an OrderedCollection
represents an OrderedSet data type, we can define SortedCollection
to represent a SortedSet. It inherits from OrderedCollection, since a SortedSet is just an OrderedSet that reorders itself upon any new insertion. We can also define a property sortType
which is an analogue of the orderType
we defined above.
{
"@context": [
"https://www.w3.org/ns/activitystreams",
{
"orderType": {
"@id": "https://w3id.org/fep/xxxx/orderType",
"@type": "@vocab"
},
"ReverseChronological": "https://w3id.org/fep/xxxx/ReverseChronological",
"ForwardChronological": "https://w3id.org/fep/xxxx/ForwardChronological",
"SortedCollection": "https://w3id.org/fep/xxxx/SortedCollection",
"sortType": {
"@id": "https://w3id.org/fep/xxxx/sortType",
"@type": "@vocab"
}
}
],
"id": "https://domain.example/some-collection",
"type": "SortedCollection",
"orderedItems": [
"https://domain.example/objects/4",
"https://domain.example/objects/2",
"https://domain.example/objects/1"
],
"sortType": "ReverseChronological"
}
But now we run into a caveat: it might not be sufficient to define classes/types for sortType
, because the sorting might happen based on some property instead. So it looks like we're gonna need another mechanism...
Maybe something like sortedBy
and sortOrder
? These would both be @type: @vocab
. The sortedBy
would be a functional property that indicates which property is being used for sorting, for example, we could sort by as:published
(although perhaps we shouldn't?); the sortOrder
would take one of two type/class/vocab values: either Ascending
or Descending
. So our example object and context is starting to look like this:
{
"@context": [
"https://www.w3.org/ns/activitystreams",
{
"orderType": {
"@id": "https://w3id.org/fep/xxxx/orderType",
"@type": "@vocab"
},
"ReverseChronological": "https://w3id.org/fep/xxxx/ReverseChronological",
"ForwardChronological": "https://w3id.org/fep/xxxx/ForwardChronological",
"SortedCollection": "https://w3id.org/fep/xxxx/SortedCollection",
"sortedBy": {
"@id": "https://w3id.org/fep/xxxx/sortedBy",
"@type": "@vocab"
},
"sortOrder": {
"@id": "https://w3id.org/fep/xxxx/sortOrder",
"@type": "@vocab"
},
"Ascending": "https://w3id.org/fep/xxxx/Ascending",
"Descending": "https://w3id.org/fep/xxxx/Descending"
}
],
"id": "https://domain.example/some-collection",
"type": "SortedCollection",
"orderedItems": [
"https://domain.example/objects/4",
"https://domain.example/objects/2",
"https://domain.example/objects/1"
],
"sortedBy": "published",
"sortOrder": "Descending"
}
Now finally, we can send an Add and know for sure that it will be inserted in the right index/position of the underlying @list
container:
{
"@context": "https://www.w3.org/ns/activitystreams",
"id": "https://domain.example/some-activity",
"actor": "https://domain.example/some-actor",
"type": "Add",
"object": "https://domain.example/objects/3",
"target": "https://domain.example/some-collection"
}
We now expect that the result will be [4, 3, 2, 1]
instead of [3,4,2,1]
.
We might also need a comparison function, but this is currently left unconsidered.
Q3: Are there other ways to approach insertion into an OrderedCollection?
We might not want our OrderedSet to be a SortedSet specifically. We might want some specific ordering that is also specifically not sorted according to any criteria or property.
Inserting a new item at a specific position
In the most basic case, this might be doable with a property like insertAfter
? This would allow us to specify where in the @list
to insert the new item.
Say we have orderedItems: [4, 2, 1]
. We might formulate the following Add activity:
{
"@context": [
"https://www.w3.org/ns/activitystreams",
{
"insertAfter": {
"@id": "https://w3id.org/fep/xxxx/insertAfter",
"@type": "@id"
}
}
],
"id": "https://domain.example/some-activity",
"actor": "https://domain.example/some-actor",
"type": "Add",
"object": "https://domain.example/objects/3",
"insertAfter": "https://domain.example/objects/4"
}
Did you spot the flaw in this? We have a problem: the receiving server might not understand the insertAfter
property, and therefore might process the Add
in a way that we didn't expect! We end up with [3, 4, 2, 1]
because the insertAfter: 4
wasn't understood, and so we didn't get the desired result of [4, 3, 2, 1]
.
This can be fixed by changing the Add
to an Insert
. Now, the server has to specifically understand the Insert activity and its side effects, or else the activity will not be processed.
{
"@context": [
"https://www.w3.org/ns/activitystreams",
{
"insertAfter": {
"@id": "https://w3id.org/fep/xxxx/insertAfter",
"@type": "@id"
},
"Insert": "https://w3id.org/fep/xxxx/Insert"
}
],
"id": "https://domain.example/some-activity",
"actor": "https://domain.example/some-actor",
"type": "Insert",
"object": "https://domain.example/objects/3",
"insertAfter": "https://domain.example/objects/4"
}
We could also define an insertBefore
property, but I think this isn't necessary and overall just complicates things. Generally when updating a linked list, you traverse it going forwards; the @list
of orderedItems
is also presented in forward ordering. It therefore makes the implementation simpler at a data structure level to only define insertAfter
and no insertBefore
.
One remaining caveat has to do with ActivityPub delivery. In the case that we send an activity to our local server via C2S, we are not guaranteed that it will have any side effects processed; the activity may be malformed or otherwise not processable. But the presence of addressing properties like to
, cc
and audience
will cause the activity to be delivered to the recipients regardless of whether it had any local side effects -- perhaps the remote server will know what to do with the activity. If the remote server also doesn't know how to process the activity, then you end up with a no-op on both your local server and on the remote server. Perhaps some client reading their inbox
might be able to derive some meaning from it...?
Inserting multiple items? (WIP)
Maybe we could do "object": {"@list": [...]}
but this might not be friendly to LD-unaware processors. Also this doesn't allow for inserting multiple items at different locations, but maybe those should be separate Insert activities/operations anyway.
Moving an item to a new position in the list? (WIP)
Just like we defined an Insert activity earlier, we might want to use something like a Move activity to reorder our OrderedCollection. Say we have a list in the form orderedItems: [3, 4, 2, 1]
, perhaps by some permutation of aware or unaware servers leading to the error case from before. Well, how do we fix this?
What we want in this case is to move 3
to be after 4
. (Alternatively, we can achieve the same result by moving 4
to come before 3
, but we already established that "insert before" is harder than "insert after". If we need to reconsider this, then it can be reconsidered.)
We might formulate "move 3
to be after 4
" with the following Move activity:
{
"@context": [
"https://www.w3.org/ns/activitystreams",
{
"insertAfter": {
"@id": "https://w3id.org/fep/xxxx/insertAfter",
"@type": "@id"
},
"Insert": "https://w3id.org/fep/xxxx/Insert"
}
],
"id": "https://domain.example/some-activity",
"actor": "https://domain.example/some-actor",
"type": "Move",
"object": "https://domain.example/objects/3",
"origin": "https://domain.example/some-collection",
"target": "https://domain.example/some-collection",
"insertAfter": "https://domain.example/objects/4"
}
This Move
activity is equivalent to doing a Remove
first, followed by a new Insert
; compare to a Move
activity normally being equivalent to doing a Remove
first and then following up with an Add
.
But we run into the same problem as before -- namely, it's not clear whether Move
in this case means "Remove from origin
and Add to target
", or if it means "Remove from origin
and Insert into target
". So we probably need a new activity type for this kind of operation. I have no idea what to call it.
For now, the safest thing is probably to fall back to doing this in 2 activities: first, send a Remove
, then send an Insert
.
Completely reordering the items
Given a complex enough OrderedCollection, you might find it easier to just create a new collection and Add or Insert items as appropriate... but this is not an option if you want to preserve the collection's id. (Well, it could be an option, but you'd have to define a mechanism to reassign identifiers first. Perhaps a Migrate
activity could be defined in some other FEP? Alternatively, Move
might work, but I'd be wary of overloading the semantics of "moving" an object. Of course, Mastodon's "account migration" feature already uses Move
in exactly this overloading way, so...)
So given the same OrderedCollection, we might consider an Update
that changes the orderedItems
around:
{
"@context": "https://www.w3.org/ns/activitystreams",
"id": "https://domain.example/some-activity",
"actor": "https://domain.example/some-actor",
"type": "Update",
"object": {
"id": "https://domain.example/some-collection",
"orderedItems": [
"https://domain.example/objects/4",
"https://domain.example/objects/3",
"https://domain.example/objects/2",
"https://domain.example/objects/1",
]
}
}
In the case where an OrderedCollection is unpaged but just very very large, then the answer is to simply construct a similarly large Update activity. But the larger the collection grows, the more likely it is to be paged. So if the OrderedCollection is paged and the pages are reified, you might only be able to Update one page at a time. You might even find yourself needing to Move items between one reified page and another! So maybe you can construct a new OrderedCollectionPage and carefully update the next
and prev
links of what is effectively a linked list. The exact algorithm for doing this is left as an exercise for the reader, because this is already getting to be a very long post, and adding more examples for this increasingly niche and situation-dependent use case is going to make it even longer... eh, perhaps in a follow-up.
Current summary and conclusions
- An OrderedCollection's items are expressed in JSON-LD as a
@list
but the data type is more correctly an ordered set, not an ordered list. This is because of the AS2-Vocab definitions. This means that items can only be included in a Collection at most once. - Ordering is determined by time-of-insertion and can be signalled by something like a property
orderType
and some classes likeForwardChronological
andReverseChronological
to indicate if the latest appended item will be the first item (LIFO/FILO) or if the latest appended item will be the last item (LILO/FIFO). - We could define a new type
SortedCollection
specifically for collections that are not only ordered, but also sorted according to some criteria. Time-of-insertion doesn't matter here, as any newly inserted item will take its place within the sorting. We might define thesortType
somehow using vocab terms again, but it is more useful to usesortedBy
pointing to a property name, andsortOrder
can be one of two classes, eitherAscending
(smallest first) orDescending
(largest first). We might also need a comparison function, but this is currently left unconsidered. - For cases where the collection is ordered but not sorted, we could define an
Insert
activity and aninsertAfter
property to point to the object after which the Insert's object will be inserted. Currently left unconsidered is the possibility of inserting at a specific index numerically, as well as the ability toinsertBefore
.- Inserting multiple objects into an OrderedCollection also needs to be thought about.
- Moving items typically is equivalent to Remove then Add. We might need an activity that is equivalent to Remove then Insert, but I have no idea what to call it. For now, the safest thing is probably to fall back to doing this in 2 activities: first, send a
Remove
, then send anInsert
. - Items might be completely reordered with an Update targeting the orderedItems property but this can be kinda complicated when dealing with paged collections, especially if pages are reified. You will possibly have to construct new pages and manipulate the
next
/prev
links between them.
- For now, we neglect to consider or define non-set types, such as List which is explicitly allowed to contain the same item multiple times.