adr-7: decided
decided 2024-01-02

Data Model for Generic Posts

We can expect almost any of the ActivityStreams Object types to federate in a way that resembles a post. And we will want to do the same eventually; we don't need to limit ourselves to just Notes. This offers a way to represent posts in a timeline, thread, etc independently of their specific content.

Decision

The basic idea is to decouple the posts that make up a feed or thread from their content. We define a Post type, which is a container for specialty content that would populate the post. We would start with Notes and Images as content, but other types like Polls, Events, Articles, and so on would be supported in the future. Each content record can belong to at most 1 Post. Here's a simplified model of what those relations look like. Profile and Audience already exist and would be essentially unchanged.

erDiagram
    Note zero or one -- one Post : posted
    Image 0+ -- one Post : posted
    Etc 0+ -- one Post : posted
    Post 0+ -- 1+ Profile : created-by
    Post 0+ -- 0+ Profile : liked-by
    Post 0+ -- 0+ Profile : shared-by
    Post 0+ -- 0+ Profile: addressed-to
    Post 0+ -- 0+ Audience : visible-to
    Post 0+ -- zero or one Post : in-reply-to
    Profile 1+ -- 1+ Audience : member-of
    
    Post {
        uuid id PK
        uri uri "indexed"
        uri thread "indexed"
        timestamp created
        timestamp updated
        string client
    }
    
    Note {
        uuid id PK
        uri uri "indexed"
        string summary
        string preview
        string source
    }
    
    Image {
        uuid id PK
        uri uri "indexed"
        string summary
        string preview
        string source
    }
    
    Etc {
        uuid id PK
        uri uri "indexed"
        string summary
        string preview
        string source
 }
    

So, everything becomes a post, with some child content items. It doesn't currently happen (much?) in the wild, but this would allow us to easily handle activities that include multiple content-like objects (multiple notes, images, questions, mix-and-match, etc). The types above are mostly illustrative, not complete or final. The idea is that they would implement an interface, call it IContent, which would provide enough information to build a feed of posts. That would make it a lot easier to actually provide those feeds to clients. Without something like that, clients would have had to make at least one follow-up request to get the actual content of feed items that were otherwise only a list of IDs/URLs.

classDiagram
    direction LR
    class IContent {
        <<interface>>
        Uuid Id
        Uri Uri
        Uri Thread
        string? Summary
        string? Preview
        string? Source
        
        Type() string
    }
    
    class Post {
        Uuid Id
        Uri Uri
        Uri Thread
        DateTime Created
        DateTime Updated
        IList~IContent~ Contents
        string Client
    }
    
    class Note {
        string Content
        Type() "Note"
    }

    class Image {
        MediaType MediaType
        Type() "Image"
    }
    
    class FeedDto {
        Post Post
        Profile CreatedBy
        Profile SharedBy
        Post InReplyTo
        Uri Thread
        Content Content
    }
    
    class Content {
        Uuid Id
        Uri Uri
        Uri Thread
        string? Summary
        string? Preview
        string? Source
        
        Type()*
    }

    Content <|-- Note : inherits
    Content <|-- Image : inherits
    Content <|-- Etc : inherits
    IContent <|-- Content : implements
    Post --* IContent : contains
    Post --> FeedDto : maps-to
    FeedDto --* Content : contains

Impact

This will obviously introduce some new data types. It's also likely to require some complex migrations. It should ideally happen soon, before any serious work starts on posting-related features.

You may also notice that this data model is more restrictive than what ActivityPub would allow. Notably, Posts can only be InReplyTo at most one other Post (and no other kind of object). A Note can only be in one Post and vice-versa. And because IContent.Thread would map to AS Context, a post can only have one context and exist in one thread. These restrictions should be fully compatible with the actual behavior of other fedi services in the wild. It helps to narrow the possibility space that we would actually have to deal with, and should support faster queries for the most common cases (when a post is a note and nothing else). This means that there could be potential future fedi peers that we would not be readily compatible with.

Context

The content-related data schema we have now is troublesome to work with, and that will only get worse over time. It was developed quickly, very early in the project, and it was always likely that it wouldn't be viable in the long term. The original schema was useful to provide some anchor points to develop an internal data model that isn't simply replicating all of ActivityPub (a likely impossible task). It also provided a point to iterate from. Now we're iterating, with more care and more context.

For reference, the model we have now is that various classes (Note and Image at the moment) implement an IContentRef interface, and that's kind of it. Those objects can be assigned a resolvable URI based on their types, and those URIs are the critical component of Feed records. To actually use this system in practice, the client would retrieve a batch of records for the feed, and it would consist of little more than URI which it would have to then issue individual follow-up requests to resolve. Which is not great. It's also surprisingly difficult to map AP docs to these objects. I'm not sure this design solves that problem on its own, but I think it does become easier to build a ContentFactory or similar, which was likely always going to be necessary. Then we can map them to Content objects, as built by the factory, which can handle the logic of deciding what concrete type to construct.

Discussion

notes or just post body?

Is it better to have notes as a content object of the post, or should the note just be sort of body properties on the post? It's probably more complicated to serialize reliably if the post body becomes a note on the wire. But it would save a join for the large majority of records. Sort of. The join is still necessary for the query, because we still need the other content items. But there would be no results from the other joined tables most of the time, and I expect the query planner to handle that efficiently.

schema

Making each content record include-able in only one post saves on potentially a ton of many-many joins. This makes it more involved to re-use uploaded media. For example, to reuse the same reaction gif repeatedly. We would need to introduce other models to track those uploads as files, not just post content. But, that seems worth it to me.

We can defer that model design until we start work on the file/media features.

hypothetical compatibility

Not attempting to be universally future compatible seems reasonable. It's not like it would be possible anyway. We can work to improve compatibility in the future, if such an incompatible peer service should ever be created.

need the interface?

Do we need IContent, or is the base/abstract Content class enough?

AS inReplyTo

AP/AS have some types that are pretty obviously content: Note, Image, Document, Video, Audio, Article, Page, Event, Question. It's not hard to figure out what it means for something to be inReplyTo one of those types. But, Object is the base type for everything in ActivityStreams, and the spec permits any object to be inReplyTo any other object. What does it mean if a Note is in reply to a Collection, or a Profile, or an Update? I don't know, and neither do the spec authors. So, we're just not dealing with that. If we ever start receiving federated documents doing that, then we can decide how to handle it. And honestly that might mean we just choose to discard those activities for being incomprehensible to our data model. It seems like that would have to be either a very different app than ours, or just a nonsensical implementation.

Quotes?

How does this design model quote posts?

There are a couple of options. We can experiment when we get to that point, but the one that seems most likely is if Post also implements IContent, and then other posts can simply be part of the contents collection. See the PR for some discussion.