trwnh.com/unified.test.hugo/content/theorycrafting/evolving-orderedcollection-to-be-more-useful/index.md

23 KiB

+++ title = "Evolving OrderedCollection to be more useful" date = 2024-10-04T21:50:07-06:00 toc = true autonumbering = true streams = ["all"] tags = ["activitypub", "orderedcollection", "as2"] params.inReplyTo name = "Reordering user-created collections" url = "https://github.com/w3c/activitypub/issues/439" params.syndication name = "Original post on SocialHub" url = "https://socialhub.activitypub.rocks/t/pre-fep-evolving-orderedcollection-to-be-more-useful/4608" +++

Continuing from (and expanding on) this GitHub issue: https://github.com/w3c/activitypub/issues/439

Recently I've been thinking about OrderedCollection again and how the way it's specified doesn't actually match up with the way it's defined, or with what you'd need for it to be a useful data type. Some issues I could identify are laid out below.

Q1: Is it a List or a Set?

Put another way: can you Add an item into an OrderedCollection more than once?

The argument for OrderedList

If you define a Collection in terms of items, then it is pretty clearly a Set, and in JSON-LD it uses the @set container (which is the default container for every term's value). It seems pretty straightforward to look at Add and Remove as set operations which align with e.g. Python's set.add() and set.remove() functions. (In other languages, these operations may be called "insert" and "delete".)

If you define OrderedCollection in terms of orderedItems, then this is simply a term which defines as:items with a @container: @list, so declaring orderedItems instead of items will use the JSON-LD @list container instead of using the JSON-LD @set container. One notable thing about the @list container is that it is not the same as an OrderedSet -- it can actually contain duplicates. In JSON-LD, this is clearly described as an "ordered list". For example:

{"@list": [{"@id": "https://url.example"}, {"@id": "https://url.example"}]}

The argument for OrderedSet

The problem with taking OrderedCollection to be defined in terms of orderedItems is that OrderedCollection in AS2-Vocab is actually defined as inheriting from Collection. The actual definitions given are the following:

A Collection is a subtype of Object that represents ordered or unordered sets of Object or Link instances. {cite="https://www.w3.org/TR/activitystreams-vocabulary/#dfn-collection" title="Activity Vocabulary" ref="§ 2 "Core Types" # Collection"}

A subtype of Collection in which members of the logical collection are assumed to always be strictly ordered. {cite="https://www.w3.org/TR/activitystreams-vocabulary/#dfn-orderedcollection" title="Activity Vocabulary" ref="§ 2 "Core Types" # OrderedCollection"}

So if Collection is a set (ordered or unordered), and OrderedCollection inherits from Collection, then OrderedCollection is logically implied to also be a set (ordered).

This is consistent with the use of Add and Remove to manipulate both Collection and OrderedCollection. But that leads into the next question...

Q2: If it's an ordered set, then is it an OrderedSet or a SortedSet?

Put another way: does Adding an item into an OrderedCollection simply append at the end, or does it potentially cause the set to be reordered to match some kind of sorting?

To give a concrete example, consider how ActivityPub mandates that OrderedCollection MUST be reverse chronological. (This has been the subject of an errata intending to relax this requirement to only apply to the properties defined as OrderedCollection within ActivityPub, but the general problem still applies.) ActivityPub gives this clarifying note:

What property is used to determine the reverse chronological order is intentionally left as an implementation detail. For example, many SQL-style databases use an incrementing integer as an identifier, which can be reasonably used for handling insertion order in most cases. In other databases, an insertion time timestamp may be preferred. What is used isn't important, but the ordering of elements must remain intact, with newer items first. A property which changes regularly, such a "last updated" timestamp, should not be used. {cite="https://www.w3.org/TR/activitypub/#collections" title="ActivityPub" ref="§ 5 "Collections" # Note 3"}

So this clearly points toward using something like "insertion time" or "incrementing integer", which means that an Add targeting an OrderedCollection will by default insert the new item at the end of the @list. Or the beginning, once it's reversed.

So how do we determine ordering?

By default, the ordering is assumed "reverse chronological by time of insertion", although with the relaxed restriction from the errata, you are free to order in any other way you wish. But if this is the case, then it would be a good idea to hint or signal what that ordering is.

Prior art

Looking at prior art, we have https://schema.org/ItemList which provides a property https://schema.org/itemListOrder which in turn can be either freeform text, or it can be one of the following https://schema.org/ItemListOrderType

  • https://schema.org/ItemListOrderAscending
  • https://schema.org/ItemListOrderDescending
  • https://schema.org/ItemListUnordered

The ordering happens based on each https://schema.org/ListItem having a https://schema.org/position property. You can navigate a doubly-linked list with https://schema.org/nextItem or https://schema.org/previousItem on each ListItem.

What can we do in AS2?

It doesn't immediately make sense to put nextItem and previousItem style links directly on the object, because an object may be part of multiple collections. I guess you could maybe inject those properties when rendering or presenting an OrderedCollection, but this wouldn't really make sense either. The ordering in an OrderedCollection is already handled by JSON-LD's @list container, which in plain JSON would be an ordered array. (With paging, you can follow as:first/as:next* or you can follow as:last/as:prev* to traverse the entire OrderedCollection, one OrderedCollectionPage at a time.)

So maybe we could define some property orderType, which gives a hint as to how the collection is ordered? The value of this property is left kind of open-ended, but it could be handled by vocabulary terms, if you just define the orderType term as @type: @vocab. This would allow you to explicitly define classes for things like ReverseChronological, ForwardChronological, or any other well-defined specific concept.

{
  "@context": [
    "https://www.w3.org/ns/activitystreams",
    {
      "orderType": {
        "@id": "https://w3id.org/fep/xxxx/orderType",
        "@type": "@vocab"
      },
      "ReverseChronological": "https://w3id.org/fep/xxxx/ReverseChronological",
      "ForwardChronological": "https://w3id.org/fep/xxxx/ForwardChronological"
    }
  ],
  "id": "https://domain.example/some-collection",
  "type": "OrderedCollection",
  "orderedItems": [
    "https://domain.example/objects/3",
    "https://domain.example/objects/2",
    "https://domain.example/objects/1"
  ],
  "orderType": "ReverseChronological"
}

This at least gives us one more option to be expressive; we are no longer assuming everything to be reverse chronological, but we can instead be explicit about things being forward chronological or reverse chronological.

Maybe we need a SortedCollection type analogue to OrderedCollection?

It may be that Adding something into the collection doesn't simply append it at the end, but rather causes the entire collection to be re-sorted. For example, consider a collection representing a conversation which is ordered "forward chronologically", but in constructing the collection, we missed an item somewhere in the middle. Rather than start over entirely, or just living with having the missed item be appended out-of-order at the end, we might instead define a new type of collection to handle this use case. Whereas the Collection represents an UnorderedSet data type, and an OrderedCollection represents an OrderedSet data type, we can define SortedCollection to represent a SortedSet. It inherits from OrderedCollection, since a SortedSet is just an OrderedSet that reorders itself upon any new insertion. We can also define a property sortType which is an analogue of the orderType we defined above.

{
  "@context": [
    "https://www.w3.org/ns/activitystreams",
    {
      "orderType": {
        "@id": "https://w3id.org/fep/xxxx/orderType",
        "@type": "@vocab"
      },
      "ReverseChronological": "https://w3id.org/fep/xxxx/ReverseChronological",
      "ForwardChronological": "https://w3id.org/fep/xxxx/ForwardChronological",
      "SortedCollection": "https://w3id.org/fep/xxxx/SortedCollection",
      "sortType": {
        "@id": "https://w3id.org/fep/xxxx/sortType",
        "@type": "@vocab"
      }
    }
  ],
  "id": "https://domain.example/some-collection",
  "type": "SortedCollection",
  "orderedItems": [
    "https://domain.example/objects/4",
    "https://domain.example/objects/2",
    "https://domain.example/objects/1"
  ],
  "sortType": "ReverseChronological"
}

But now we run into a caveat: it might not be sufficient to define classes/types for sortType, because the sorting might happen based on some property instead. So it looks like we're gonna need another mechanism...

Maybe something like sortedBy and sortOrder? These would both be @type: @vocab. The sortedBy would be a functional property that indicates which property is being used for sorting, so for example, we could sort by as:published (although perhaps we shouldn't?); meanwhile, the sortOrder would take one of two type/class/vocab values: either Ascending or Descending. So our example object and context is starting to look like this:

{
  "@context": [
    "https://www.w3.org/ns/activitystreams",
    {
      "orderType": {
        "@id": "https://w3id.org/fep/xxxx/orderType",
        "@type": "@vocab"
      },
      "ReverseChronological": "https://w3id.org/fep/xxxx/ReverseChronological",
      "ForwardChronological": "https://w3id.org/fep/xxxx/ForwardChronological",
      "SortedCollection": "https://w3id.org/fep/xxxx/SortedCollection",
      "sortedBy": {
        "@id": "https://w3id.org/fep/xxxx/sortedBy",
        "@type": "@vocab"
      },
      "sortOrder": {
        "@id": "https://w3id.org/fep/xxxx/sortOrder",
        "@type": "@vocab"
      },
      "Ascending": "https://w3id.org/fep/xxxx/Ascending",
      "Descending": "https://w3id.org/fep/xxxx/Descending"
    }
  ],
  "id": "https://domain.example/some-collection",
  "type": "SortedCollection",
  "orderedItems": [
    "https://domain.example/objects/4",
    "https://domain.example/objects/2",
    "https://domain.example/objects/1"
  ],
  "sortedBy": "published",
  "sortOrder": "Ascending"
}

Now finally, we can send an Add and know for sure that it will be inserted in the right index/position of the underlying @list container:

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "id": "https://domain.example/some-activity",
  "actor": "https://domain.example/some-actor",
  "type": "Add",
  "object": "https://domain.example/objects/3",
  "target": "https://domain.example/some-collection"
}

We now expect that the result will be [4, 3, 2, 1] instead of [3,4,2,1].

We might also need a comparison function, but this is currently left unconsidered. Maybe we can generally assume that strings will be sorted lexically, and anything that can be coerced into a number (i.e. either it is a number in JSON-LD, or it is a @value whose @type is something like http://www.w3.org/2001/XMLSchema#integer) will be coerced into a number and compared by its magnitude.

Q3: Are there other ways to approach insertion into an OrderedCollection?

We might not want our OrderedSet to be a SortedSet specifically. We might want some specific ordering that is also specifically not sorted according to any criteria or property.

Inserting a new item at a specific position

In the most basic case, this might be doable with a property like insertAfter? This would allow us to specify where in the @list to insert the new item.

Say we have orderedItems: [4, 2, 1]. We might formulate the following Add activity:

{
  "@context": [
    "https://www.w3.org/ns/activitystreams",
    {
      "insertAfter": {
        "@id": "https://w3id.org/fep/xxxx/insertAfter",
        "@type": "@id"
      }
    }
  ],
  "id": "https://domain.example/some-activity",
  "actor": "https://domain.example/some-actor",
  "type": "Add",
  "object": "https://domain.example/objects/3",
  "insertAfter": "https://domain.example/objects/4"
}

Did you spot the flaw in this? We have a problem: the receiving server might not understand the insertAfter property, and therefore might process the Add in a way that we didn't expect! We end up with [3, 4, 2, 1] because the insertAfter: 4 wasn't understood, and so we didn't get the desired result of [4, 3, 2, 1].

This can be fixed by changing the Add to an Insert. Now, the server has to specifically understand the Insert activity and its side effects, or else the activity will not be processed.

{
  "@context": [
    "https://www.w3.org/ns/activitystreams",
    {
      "insertAfter": {
        "@id": "https://w3id.org/fep/xxxx/insertAfter",
        "@type": "@id"
      },
      "Insert": "https://w3id.org/fep/xxxx/Insert"
    }
  ],
  "id": "https://domain.example/some-activity",
  "actor": "https://domain.example/some-actor",
  "type": "Insert",
  "object": "https://domain.example/objects/3",
  "insertAfter": "https://domain.example/objects/4"
}

We could also define an insertBefore property, but I think this isn't necessary and overall just complicates things. Generally when updating a linked list, you traverse it going forwards; the @list of orderedItems is also presented in forward ordering. It therefore makes the implementation simpler at a data structure level to only define insertAfter and no insertBefore.

One remaining caveat has to do with ActivityPub delivery. In the case that we send an activity to our local server via C2S, we are not guaranteed that it will have any side effects processed; the activity may be malformed or otherwise not processable. But the presence of addressing properties like to, cc and audience will cause the activity to be delivered to the recipients regardless of whether it had any local side effects -- perhaps the remote server will know what to do with the activity. If the remote server also doesn't know how to process the activity, then you end up with a no-op on both your local server and on the remote server. Perhaps some client reading their inbox might be able to derive some meaning from it...?

Inserting multiple items? (WIP)

Maybe we could do "object": {"@list": [...]} but this might not be friendly to LD-unaware processors. Also this doesn't allow for inserting multiple items at different locations, but maybe those should be separate Insert activities/operations anyway.

Moving an item to a new position in the list? (WIP)

Just like we defined an Insert activity earlier, we might want to use something like a Move activity to reorder our OrderedCollection. Say we have a list in the form orderedItems: [3, 4, 2, 1], perhaps by some permutation of aware or unaware servers leading to the error case from before. Well, how do we fix this?

What we want in this case is to move 3 to be after 4. (Alternatively, we can achieve the same result by moving 4 to come before 3, but we already established that "insert before" is harder than "insert after". If we need to reconsider this, then it can be reconsidered.)

We might formulate "move 3 to be after 4" with the following Move activity:

{
  "@context": [
    "https://www.w3.org/ns/activitystreams",
    {
      "insertAfter": {
        "@id": "https://w3id.org/fep/xxxx/insertAfter",
        "@type": "@id"
      },
      "Insert": "https://w3id.org/fep/xxxx/Insert"
    }
  ],
  "id": "https://domain.example/some-activity",
  "actor": "https://domain.example/some-actor",
  "type": "Move",
  "object": "https://domain.example/objects/3",
  "origin": "https://domain.example/some-collection",
  "target": "https://domain.example/some-collection",
  "insertAfter": "https://domain.example/objects/4"
}

This Move activity is equivalent to doing a Remove first, followed by a new Insert; compare to a Move activity normally being equivalent to doing a Remove first and then following up with an Add.

But we run into the same problem as before -- namely, it's not clear whether Move in this case means "Remove from origin and Add to target", or if it means "Remove from origin and Insert into target". So we probably need a new activity type for this kind of operation. I have no idea what to call it.

For now, the safest thing is probably to fall back to doing this in 2 activities: first, send a Remove, then send an Insert.

Completely reordering the items

Given a complex enough OrderedCollection, you might find it easier to just create a new collection and Add or Insert items as appropriate... but this is not an option if you want to preserve the collection's id. (Well, it could be an option, but you'd have to define a mechanism to reassign identifiers first. Perhaps a Migrate activity could be defined in some other FEP? Alternatively, Move might work, but I'd be wary of overloading the semantics of "moving" an object. Of course, Mastodon's "account migration" feature already uses Move in exactly this overloading way, so...)

So given the same OrderedCollection, we might consider an Update that changes the orderedItems around:

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "id": "https://domain.example/some-activity",
  "actor": "https://domain.example/some-actor",
  "type": "Update",
  "object": {
    "id": "https://domain.example/some-collection",
    "orderedItems": [
      "https://domain.example/objects/4",
      "https://domain.example/objects/3",
      "https://domain.example/objects/2",
      "https://domain.example/objects/1",
    ]
  }
}

In the case where an OrderedCollection is unpaged but just very very large, then the answer is to simply construct a similarly large Update activity. But the larger the collection grows, the more likely it is to be paged. So if the OrderedCollection is paged and the pages are reified, you might only be able to Update one page at a time. You might even find yourself needing to Move items between one reified page and another! So maybe you can construct a new OrderedCollectionPage and carefully update the next and prev links of what is effectively a linked list. The exact algorithm for doing this is left as an exercise for the reader, because this is already getting to be a very long post, and adding more examples for this increasingly niche and situation-dependent use case is going to make it even longer... eh, perhaps in a follow-up.

Current summary and conclusions

  • An OrderedCollection's items are expressed in JSON-LD as a @list but the data type is more correctly an ordered set, not an ordered list. This is because of the AS2-Vocab definitions. This means that items can only be included in a Collection at most once.
  • Ordering is determined by time-of-insertion and can be signalled by something like a property orderType and some classes like ForwardChronological and ReverseChronological to indicate if the latest appended item will be the first item (LIFO/FILO) or if the latest appended item will be the last item (LILO/FIFO).
  • We could define a new type SortedCollection specifically for collections that are not only ordered, but also sorted according to some criteria. Time-of-insertion doesn't matter here, as any newly inserted item will take its place within the sorting. We might define the sortType somehow using vocab terms again, but it is more useful to use sortedBy pointing to a property name, and sortOrder can be one of two classes, either Ascending (smallest first) or Descending (largest first). We might also need a comparison function, but this is currently left unconsidered -- it can be generally expected that strings can be sorted lexigraphically and numbers can be sorted by magnitude (with some consideration toward coercing any string literal @value if its @type is something like http://www.w3.org/2001/XMLSchema#integer).
  • For cases where the collection is ordered but not sorted, we could define an Insert activity and an insertAfter property to point to the object after which the Insert.object will be inserted. Currently left unconsidered is the possibility of inserting at a specific index numerically (which would likely cause problems if processed out-of-order), as well as the ability to insertBefore.
    • Inserting multiple objects into an OrderedCollection also needs to be thought about.
    • Moveing items typically is equivalent to Remove then Add. We might need an activity that is equivalent to Remove then Insert, but I have no idea what to call it. For now, the safest thing is probably to fall back to doing this in 2 activities: first, send a Remove, then send an Insert.
    • Items might be completely reordered with an Update targeting the orderedItems property but this can be kinda complicated when dealing with paged collections, especially if pages are reified. You will possibly have to construct new pages and manipulate the next/prev links between them.
  • For now, we neglect to consider or define non-set types, such as a potential List which is explicitly allowed to contain the same item multiple times. For this, use the JSON-LD @list or the rdf:List class with rdf:first, rdf:rest, and rdf:nil.