Core 2.0: Transports

dimitrie · 26 August 2020 09:14

Preamble

Transports are the last important concept of the 2.0 SDK that needs to be introduced. As a quick reminder, previously we’ve covered:

Object Models & Kits, which is Speckle’s take on interoperability,
The base object, which controls how data is represented, and allows for both dynamic and strongly typed properties, and
The decomposition API, which controls how data is structured, and resolves the tree vs. blob dichotomy.

Next up is how Speckle transports design data to and from a storage layer - a Speckle Server, a local database, or even a drive.

Introduction

One of the most common use-cases we’ve encountered throughout the past 5 years of interacting with developers from the AEC space is “writing data to a database”, and, obviously, getting it back out again. These databases vary - sometimes they’re text files on a network drive, sometimes they’re Azure SQL provisions, sometimes it’s MongoDB, sometimes it’s an Excel file, sometimes it’s core rope memory (joking, I’ve never seen that one - yet!).

An important part of our mission at Speckle is to empower developers across AEC. We want to give you a leg up so you can “automate all the things” and build powerful applications that leverage modern tech.

By introducing the concept of Transports in Speckle 2.0, we’re giving developers in the AEC space full control over where and how this data is stored. By contrast, in 1.0, you could only write to one place, a given Speckle Server (also to a local cache, if pedantic).

In 2.0, Transports allow Speckle to switch, choose and mix various storage layers. This allows us to write better, more memory efficient and predictable connectors. It also allows you, as a developer, to use Speckle as a springboard in your development work in a much more flexible way.

Definitions

Before we dive in and see how they work, let’s get some definitions out of the way:

Transports

In Speckle 2.0, transports take Speckle Objects from memory and persist it to a specific storage layer. Of course, they’re also responsible for taking those objects out of the storage layer and back into memory.

Storage Layers

The storage layer that a transport writes to can be, ultimately, anything you want it to be. The Core 2.0 SDK will implement a couple of them, specifically:

an SQLite Transport
an In-Memory Transport
a Speckle Server 2.0 Transport
a Disk Transport

Other possible targets for transports can be MongoDB databases, S3 storage, a network drive, etc. You can also write a transport that writes to more than one place, but we advise against it - the 2.0 .NET API provides a better way to achieve the same thing.

Moreover, in the case of disk based transports (ie, disk, sqlite, etc.) you can also control the location of where this data is stored.

Sending Data: Serialisation and Writing

Serialisation is a potentially quite wasteful operation depending on the size of the data. One of our goals in 2.0 is to be able to support arbitrarily large models without adding too much overhead.

Extra Boring Details

Why can serialisation be wasteful, especially in terms of memory usage? Imagine you have a model with 100k building objects (geometry & data) open in an application. To send that model to Speckle, we would need to (1) convert those objects into Speckle objects, using a given converter, then (2) serialise them to whatever format we’d be ingesting down the line (e.g., a JSON string), and, finally, (3) transport them to Speckle. Over the course of these steps, your computer’s memory holds three representations of the same object - Native, Speckle, and String. Not that cool!

To reduce memory overhead, Speckle now integrates serialisation with writing objects to one or more persistance layers via transports. Whenever an object finishes serialisation, it’s sent to a transport for storage, and ultimately, its string representation gets “garbage collected”. Just like this, we’ve knocked off some memory usage.

The 2.0 API exposes two lower level methods for sending (serialising and writing) and receiving (reading and deserialising) data. In this section, we’ll look at only at the send one - the section below will provide more detail on receiving data.

First off, let’s start with the simplest, default, use case:

// First, let's create a simple base object to store our building data in. 
var myData = new Base(); 
myData["@columns"] = new List<Columns>() { ... the columns }; 
myData["@rooms"] = new List<Rooms>() { ... the rooms}; 

// Let's now send the data:
var myDataId = await Operations.Send( myData );

Looks quite simple! In this scenario, the Send method serialised and saved the myData object, together with all its detachable columns and rooms to the default Speckle local transport (an SQLite db stored in ~/AppData/Speckle), and returned the id (hash) of the myData object, which you can use to receive it back again.

Let’s look at a more advanced example:

var myRevisionId = await Operations.Send(
	myData,
	transports: new ITransport[] {
	  new ServerTransport( args ), // imagine this one server...
	  new ServerTransport( args ), // ... and this is a different one! you're basically pushing to multiple remotes!
	  new MyCustomTransport( args ) // ... and, why not another transport? 
  }
);

What’s happening here is that we’re passing in multiple transports to the Send method. Speckle will persist the given object, with all its detachable sub-children, to all these transports, and, when done, return the object’s id (hash!).

Receiving Data: Reading and Deserialisation

Again, let’s start with the simplest usecase possible:

// To set the stage, in the previous code sample, we've done:
var myDataId = await Operations.Send( myData ); 

// Now, we want to get it back out. It's as simple as:
var myData = await Operations.Receive( myDataId );

This just retrieves the object back using the default SQLite Transport. No accounts, no extra HTTP requests, no hassle. Of course, this is enough only for “local” operations that are bound to a single computer only.

If you want this to work across a local network, you can always create a SQLite Transport (or your own custom transport for that matter) that stores data on a network accessible drive. For large collaborative environments, you can, of course, receive your data from a Speckle Server.

While you can Send data to multiple places at the same time, you can’t really receive data from more places at the same time. Hence, Receive can only accept one transport as the source:

var myData = await Operations.Receive(
	myDataId, 
	remoteTransport: new ServerTransport( ... ) // where you want to receive things from! 	
);

Done! myData is now your data - a nicely deserialised Base object.

Advanced Transports

The Send and Receive functions in the examples above assume a shared global cache located in ~/User/.config/Speckle. In the case of Send, this means that objects will be written to a SQLite Transport that is used by Speckle always; in the case of Receive, the same SQLite Transport will be used to buffer to/from the remote transport.

Nevertheless, there are some use cases where you don’t want to, or cannot, use the shared global cache. For example, you would want to go full on classic version control, and store your history right next to your project’s file:

`-- MyProject_A            # root folder
    |-- .speckle           # the ".git" folder
    |   |-- references.db  # commits, branches, tags, etc. storage
    |   `-- objects.db     # object storage
    `-- MyProject_A.rvt    # source file

How would you actually implement such a system?

First, you can always customise the location of your transport. For example, in the case of the SQLite Transport, you can specify a folder where to actually create it. Let’s assume we want to send our data to only two transports, a local SQLite one that’s sitting in the .speckle folder just next to our Revit file, and a Speckle Server.

var customLocalTransport = new SQLiteTransport( basePath: @"{localFolderPath}/.speckle" ); 
var serverTransport = new ServerTransport(...);

The Send method can take an optional useDefaultCache: false argument to bypass the global cache, and send exclusively to the provided transports:

var myDataId = await Operations.Send( 
  myData, 
  transports: new ITransport[] { customLocalTransport, serverTransport }, 
  useDefaultCache: false // will bypass the global cache! 
);

When receiving data, if you want to use your custom local transport, the method would look like this:

var myData = await Operations.Receive( 
  objectId: myDataId, 
  remoteTransport: serverTransport,
  localTransport: customLocalTransport // will use this one instead of the global cache!
);

Another use case is in constrained environments, where you don’t have a persistent storage layer handy, such as a serverless function, or docker container. That’s where you would probably want to use a MemoryTransport - it’s obviously much faster and it doesn’t need to touch the disk.

Summary

Transports give you, as a developer, a lot of flexibility in how you can use Speckle, so hopefully “automating all the things” will become a bit easier and nicer. They make it easy to customise your workflows and where and how you store data - from the Speckle Server, to one or more databases, local or remote.

Our next steps are to wrap up some little TODOs and publish an alpha release, specifically geared towards developers (it won’t contain any connectors!) with the 2.0 Core, Server and Kits. In the meantime, we are really excited about what Transports bring to Speckle and we’d love to hear what you (@Speckle_Insider & wider public) think - what does this inspire you to do? How could this be better?

yun.sung · 26 August 2020 11:11

Nice work - we have used entity framework to persist SpeckleObjects to SQLite and mongodb , but could switch to transport for this. Is there a write up for how to write custom transport?

teocomi · 26 August 2020 11:17

@yun.sung we’re publishing an early alpha very soon that will contain the transports mentioned above:

an SQLite Transport
an In-Memory Transport
a Speckle Server 2.0 Transport
a Disk Transport

You can use those as a base until we, or our community, produce some proper documentation!

daviddekoning · 26 August 2020 11:47

The concept of transports looks quite powerful, and addresses some use cases we are asked about quite frequently (e.g. rapid transfer of data locally).

I’ve got a couple of questions:

First, when you push to multiple transports, are you creating multiple streams or synchronized versions of the same stream? My guess is that it creates multiple streams - is that right?

Secondly, @nicburgers and I had a few discussions about this and realized that notifications of changes can be a challenge. Will it be possible to subscribe to notifications from specific transports? Will each transport have a different mechanism? If there are “synchronized” streams across multiple transports, how will notifications work?

stevedowning · 27 August 2020 01:46

My emphasis added.
This implies that the transport layer and the storage layer are fairly closely coupled. At this point, I think that’s eminently pragmatic.

But from the users perspective it’s the location they care about (which server, which SQLite file) so I wonder if storage location is a better name.
Or something like that, acknowledging that ‘storage’ is not quite the right term for InMemory (which is a bit of a special case)

Or to give another example: A Git repository exists in one location (ignoring forks etc) but users may use different transport protocols (SSH or HTTPS or even FILE) to reach it.

Storage Location?
Exchange Point?
Handover dropzone?
Data Exchange?

Naming aside. I like this.
As others have said it opens up Speckle into some use cases that have been barriers to adoption (and then cause sub-optimal solutions like people starting again at the data schema level… )

dimitrie · 27 August 2020 05:55

There’s quite a few good questions and remarks here, I’ll be writing another long post now

Yes, they are. They could be decoupled cleanly later on if needed, but I doubt we’ll ever reach that level of complexity. Especially given that the tooling available in .NET is quite advanced when it comes to this (as opposed to C) - we don’t need to write up our own HTTP protocol transport, it’s already there!

Naming wise it’s what I thought it would work at the abstraction level that Speckle operates And ultimately, since the ITransport interface doesn’t enforce the actual transport protocol one would be using - you’re free to use SSH or whatever fits. E.g, the server transport currently uses http, the sqlite one uses the sqlite driver, the disk transport uses system files, etc.

What in git terminology would be References - namely Streams, Commits, Branches, etc. is a different abstraction layer in Speckle 2.0. Transports deal exclusively with storing and retrieving objects (our tree-blob hybrids).

These sit alongside the Transport layer. To give a short example: once you have an object id (hash)* from a ServerTransport, one can “bless” it as a Commit in a given Branch under a specified Stream on that specific Server.

We haven’t abstracted References under their own interface (yet?) as right now only the Server can handle this, but this might be a good idea. Doing this might actually simplify some future connector work. Anyway, I digress…

Notifications are bound to References, and objects. If you build up a reference layer on top of, for example, sqlite, you could use polling (unless there are db update events in sqlite). Or, if we get around this issue, they’ll be available in .NET on the Speckle Server client!

*My policy is to always talk about object ids as ids, but for the next six months or so, emphasize that they’re actually a hash

peter.kottke · 27 August 2020 07:37

The addition of transports is great, and it seems to be easily implemented in a Python wrapper with CLR, see below.

Does abstracting the individual object hashes from the overarching stream ID allow you to divide and recombine streams? For example, if I have a stream id A and I want to run a computationally expensive operation on it, and I decide to split the stream and run that on multiple machines that then send the results back as stream ids B1, B2, B3, B4, can I then recombine the hashes from those B streams into a single stream C?

Demonstration of Python interface with .NET API:

import clr
import numpy as np
clr.AddReference('SpeckleCore')

clr.AddReference("System.Collections")
from System.Collections.Generic import List

from Speckle.Core.Models import Base
from Speckle.Core.Api import Operations
from Speckle.Core.Transports import SQLiteTransport, ITransport

# Create the empty list of transports
transport_list = List[ITransport]()

# Add the required transport to the list of transports
transport_list.Add(SQLiteTransport(".", "Speckle", "columns"))

# Create the empty list of data
my_datas = List[Base]()
for i in range(500):
    # Create the new Base Model
    my_data = Base()

    # Add properties to the base model
    my_data.DynamicProperties["@columns"] = [x.item() for x in np.random.randint(0,10,5)]
    my_data.DynamicProperties["objects"] = [x.item() for x in np.random.randint(0,10,5)]

    # Add the new Base to the list of data
    my_datas.Add(my_data)

# Send all the list of data to the list of transports
my_ids = Operations.Send(my_datas, transports=transport_list, useDefaultCache=False).Result
print(my_ids)

# Try to recover the data. Fails with SQL because it is not yet implemented in .NET
# my_data_out = Operations.Receive(my_ids[0], remoteTransport=transport_list.get_Item(0)).Result
# print(my_data_out)

dimitrie · 27 August 2020 09:08

Heya @peter.kottke! Nice, I like when code samples get bounced back This reply is a joint venture with @teocomi who pointed out some aspects I missed before (or I didn’t explain too well)

I think the first part that needs deciphering is that streams, commits, etc. only exist currently on the Speckle Server. Object ids (hashes) are always the same, regardless of transport.

(Note to self: I think what’s needed next from us is a post on the “reference” layer of 2.0, namely commits, streams, etc.)

That aside, your workflow is totally doable - run some heavy operations in parallel on a chunked list of objects, then wait up for them and recombine them in a single commit, as long as you have the hashes of the modified objects. Travelling now so it’s a bit tricky writing a longer code sample, but I’ll try and fix up the one you posted:

PS: this should work if you pass your sqllite transport as the local one. We need to clean up the signatures of Send and Receive to make them easier to understand.

PPS: I would actually instead of sending a list of objects, I would put all my datas in one dynamic property on a Base object; like this you need to keep track of only one id

my_datas = List[Base]()
for i in range(500):
    # Create the new Base Model
    my_data = Base()
   # etc... 

# Send all the list of data to the list of transports
my_Commit = new Base();
my_Commit["@myObjectList"] = my_datas
my_commit_Id = Operations.Send(myCommit, transports=transport_list, useDefaultCache=False).Result
print(my_commit_Id)

my_commit_out = Operations.Receive(my_commit_Id, localTransport=transport_list.get_Item(0)).Result
print(my_commit_out)

sorry for mangling up the code, i’m oblvious of python (and its conventions ).

nicburgers · 28 August 2020 05:31

For my understanding, if you hypothetically cared just about sharing a set of objects with other apps on the same machine, you could choose to only use a transport that uses local storage (i.e. SQLite etc), but I guess you would still need to register the (inherently transport-agnostic) IDs from those objects as a stream on a server so that the next app can know which objects to receive, is that right?

In that example, if someone using another computer were to try to receive on that stream, would there be baked-in ways to indicate that this stream isn’t available on that computer?

dimitrie · 28 August 2020 10:48

Hey Nic! I can say that yes - your understanding is correct. I need to fight against a bit some of the naming concepts in 1.0 now to make things a wee bit more clear

The concept of a Stream changed. Programatically, it’s replaced by the universal Base object (tree-blob). Conceptually, it stays the same. Let me try to unpack that:

The “programatic” definition of Streams in 1.0, simplified, would be a flat list of objectIds. In 2.0, this is superseded by the Base object and the decomposition API that I’ve rambled about in a different post here.

So, in 1.0, you’d store a list of objects like this:

// assuming here we've saved the objects via the api previously
var myStream = new SpeckleStream(); 
myStream.Objects = myObjectReferencesList;

In 2.0, you can efficiently store not only lists, but any structure. Here’s how you would store something:

var myCommit = new Base(); 

myCommit["@LayerX"] = new List<Base>() { ... whatever objects }; 

var mySite = new Base(); 
mySite["@trees"] = allTheTrees; 
mySite["@roads"] = allTheRoads; 

myCommit["@Site"]  = mySite; // notice the "@" - it flags the props for detachment. 

// let's persist myCommit to a transport: 
var myCommitId = await Operations.Send( myCommit, transports: [ ... ]  );

Once you have that myCommitId - you can pull it back out anywhere that can reach the locations you sent it to - but at the transport level, there’s no centralisation of any sort, or streams, or anything. The other app would need the user to tell it which “objectId” to pull, and from where.

Now we switch hats again; as a developer, you care about all that stuff, and it’s really powerful to be able to customise transport layers in the back, as it opens up new doors when it comes to apps developed on top of Speckle. As a user, obviously, you don’t!

That’s why, essentially, connectors will work with a server transport, where actually you can guarantee the location of objects. This is because people using Speckle can’t really be bothered - they just want their objects back out.

dimitrie · 22 July 2021 09:04

A post was split to a new topic: Receiving all objects from a commit

tluther · 29 September 2022 18:54

I’m currently trying to use specklepy within a dockerized Lambda function and am having issues relating to the transport layer, presumably because it’s trying to write cache files on a system that’s read only.

The final exception that I’m seeing is:
“SpeckleException: SQLiteTransport could not initialise Objects.db at /home/sbx_user1051/.local/share/Speckle. Either provide a different base_path or use an alternative transport.”

I’m only using the receive method to read in a stream’s contents, and I’ve noticed that the local_transport arg is set to None by default. I’ve also tried explicitly setting it to None, but I’m still getting the same issue.

Has anyone come across similar issues and/or does anyone know how to completely disable any local storage from specklepy?

gjedlicska · 30 September 2022 09:43

Hey @tluther

funnily we’ve just ran into a similar situation ourselves a few weeks back. That was about sending, so I’m going to give an example for how to do both sending and receiving without Speckle writing anything to the filesystem.

To receive, we have a transport implementation, that stores everything in memory. You need to explicitly use that transport for your operations.receive calls.

from specklepy.api import operations
from specklepy.transports import server, memory
from devtools import debug


object_id = 'your object id'
stream_id = 'your stream id'

token = 'your valid token'
url =  'the speckle server url ie https://speckle.xyz'

server_transport = server.ServerTransport(stream_id, token=token, url=url)
memory_transport = memory.MemoryTransport()

data = operations.receive(
    object_id, remote_transport=server_transport, local_transport=memory_transport
)

debug(data)

For sending, you can just disable the default cache in the operations.send function.

# setup code ommited

transport = ServerTransport(client=client, stream_id=stream_id)
id = operations.send(base=base, transports=[transport], use_default_cache=False)

Let me know if this doesn’t work for you.

tluther · 30 September 2022 09:54

Awesome, thanks for that!

Gunjan_Patel · 22 November 2023 00:49

Snippet

var transport = new ServerTransport(speckleAccount, streamId, 120, “/tmp/speckle”);
var objectId = Speckle.Core.Api.Operations.Send(speckleObject,
new List {transport}, useDefaultCache: false
).Result;

I am trying above code in c# aws lambda and giving me read-only file system error. Need urgent help as its in production.

Jedd · 22 November 2023 11:31

Hi @Gunjan_Patel
Do you have a full exception message with stack trace?

Just from a quick look at your snippet, you’ll need to make sure that blobstorage directory has been created before you call the constructor.