Sync framework motivation

by Jeff Vroom

One of the larger complicating factors in software design, performance, and maintenance begins when two programs need to communicate. The developer needs to design or choose a protocol, perhaps adds versioning so it can be easily upraded or changed. They need to make it secure, reliable and possibly deal with retries in case of failure or handle the additional connection related errors that might occur.

In my experience, quite a bit of code goes into solving this problem. It's one of the large contributors to "code that's not application logic" that developers spend time on - writing or debugging.

Newer frameworks generate the serialization code using annotations on methods and their parameters or use a declarative file to define the protocol which generates code for both client and server. This approach reduces the amount of code developers have to write and provides more features with the downside of adding problems during debugging. It's complex to manage all of the uses of 'rpc' that an application might need for a complex domain model that is evolving quickly with lots of read/write APIs required. Not super fun to debug generated code that comes from some XML file but it's better than nothing.

GraphQL and other approaches have tried to create a declarative abstraction that's more powerful than basic RPC with some success, but it's quite common to see an explosion of RPC wrapper methods on top of domain models dealing with each new case your UI needs. Next time your domain model changes, it's more work updating all of that code you wrote to call the RPC and update the results.

I think we've been thinking about it all wrong and GraphQL is only a bandaid. Most client/server programming problems are really problems involving synchronizing data structures that live in different processes. There is really shared code and data and a complex, changing interaction involved with how and when different parts of this data structure are transfered, validated, updated, and ultimately committed to a databsae. Instead of RPC, we should think of the synchronized aspects of the types used by both applications. That means overlapping versions of the types in each application with a shared layer of properties and validation which both of them have. Rich metadata for versioning, and validating inputs along with flexible protocols for communicating with older versions.

Because the overlap is so rich logic-wise, including validation and security constraints, on-demand versus eager strategies, filters and special operations you need to define to override the defaults. The best way to manage such a complex relationship is through a rich declarative framework build on a full-featured language like Java that can generate both sides of the API.

This means the framework can efficiently manage versions and compatibility, and generate statically typed classes and interfaces which tie to application code. that you wrote for debugging.

One important design consideration important for synchronization, not required by RPC is that you identify a unique id in objects when they have one. If the object has a unique id, it's almost always best to enforce the "highlander principle" in your APIs, so that there's only a single instance of that entity in memory for a given process at a time. That's not easy to do with most RPC frameworks, but possible with synchronization.

When that's true, your code can access variables either declaratively or in code by name and know the value is up-to-date. Any property changes will be validated and committed. This approach avoids errors that arise when you are doing more complex things in your code - communicating with a remote process which needs to commit the change to a database. When there are errors or conflicts, additional versions of that instance are available so those are handled declaratively, not by creating lots of special case code.

For most problems, synchronization is a better fit as the foundation. RPC is used for building special cases and specifically 'operational requests'... basically when there are only value objects. No objects with an id involved.

With synchronization, some server types and properties are marked as synchronized for a given client. There is a carefully designed set of metadata you can use to change the defaults. The client initially renders the default HTML page, which is generated by evaluating the initial state of the server version of the tempalte. The initial page also contains code to initialize the client-version of the same set of objects. This way, when the client javascript tries to refresh the page for the first time, it generates the exact same version - no update required. But it can refresh any aspect of the UI incrementally.

When the client needs to make changes the same metadata is used in the opposite direction. Changes are queued up and on the next 'sync' sent and applied on the server. Certain properties are fetched on-demand, others may be explicitly fetched using an API. When a new object is created on the server in response to a client's request, it can automatically be sync'd to the client in the reply, or lazily fetched when it's first referenced by an object being sync'd to the client. In many applications, you can reduce or eliminate the wrapper code, and many of the RPC methods you have to code by hand using declarative annotations. When RPC is needed, it works naturally with the sync framework as you can use synchronized objects in any method call. They can be passed by reference or value as required based on the sync protocol.

Synchronization and layers allow you to organize your code based on it's role in the client/server design. Code is split into layers based on whether it runs on both the client and the server, or the server only, or client only. This is a nice way to break apart a design anyway to demarcate the code run on both, or just the client or just the server. Since static typing is used, it helps develoeprs navigate code and build code resilient to changes in code/process boundaries. When you have errors, the IDE provides a high-level error message. With this design, StrataCode can detect automatically when a method call is in a remote process for example and during code-gen time replace it with a specific RPC wrapper based on the context.

Synchronization is implemented with a small library and optionally inserted into your code at code-generation time making it possible to debug. Metadata required for synchronization is added as readable code to the generated files. To enable synchronization, fields are converted automatically to get/set methods. The set method fires the required change event to notify the sync system. When a new synchronized instance is created, the system is notified. Remote method calls can handled using code-gen with proper error detection. If the runtime does not support synchronous remote methods, this is only allowed if the method call is in a data binding expression. Otherwise, an error is generated at compile time.

Sync features

One of the advantages of doing synchronization declaratively is that you can turn on some advanced features through configuration, and by leveraging the synchronization design pattern.

With data binding and data sync in StrataCode, there's no need for asynchronous programming or function closures. You can expose a set of methods to your declarative programmers in the same way that excel exposes functions. The declarative programmer can use them in data binding expressions without needing to know if the method runs on the client or the server. And you can change the implementation later without changing the code that uses the method binding.

Realtime

Synchronization as pattern also works in both directions. The client can queue up changes for the server and vice versa. When using polling, the client picks up any queued changes waiting for it. When using a realtime connection, the client can see the results immediately.

Sync and scopes

Synchronization is built on the "scope" mechanism in StrataCode where different objects can be configured to have different lifecycles via an annotation. An object can be global to the system, global for each application (for frameworks that support more than one per process like a web server), one per browser session, one per application per browser session, one per browser window, or one per request. You can create your own scopes for a situation where you have a 'current something' - e.g. current tenant for a multi-tenant application, or current product for a product display component.

When scopes are nested, events propagate from the lower-level scope to the higher level one. The system performs necessary locking to prevent two conflicting threads from running at the same time using read/write locks. So if you have a shared object between two scopes, changes made by one are queued and delivered to the other on the next thread that runs in the destination scope.

For example, you can use the StrataCode command line to run methods on a selected application window on the browser. This makes it easy to write test scripts for client/server applications, even those that support collaborative features.

Your test script can direct commands to the server or client or both. Test script logs are designed to record important, reproducible history for validation and diganostic purposes. For example, the StrataCode html framework class TestPageLoader records the client javascript error log, the server log and the HTML output for the current page on both the client and the server.

Reliability and responsiveness

Using synchronization to manage state has a nice benefit for scaling web applications. You have available a variety of scopes to cache data as needed for different purposes - global, session, or window. When your server caches information, you avoid unnecessary database accesses and reduce request latency without needing intermediate caches simplifying your code, even for high volume web sites. If the client browser window is closed, it can refresh it's session from the server.

Similarly if that server session goes away, because it expired or the server process or some router in between was restarted, the client also maintains a copy of the synchronized state which can be used to refresh the server's session. The result is a seamless, high-end user experience with a low-cost from both coding and operation.

This approach is designed to allow developers to support a balance of reliability and efficiency all from a declarative model. I believe ultimately it will be an easy way to produce scalable web applications with low latency, simpler database architectures, and cheaper infrastructure cost. You are probably already caching data with memcached or Redis or some other "out of process" tool. Eliminate that and have lower latency and more efficiency with less code and fewer servers.

When you really need a stateless environment, use 'request' scope at the page, or component level and the same models work but with more data transfered on each request. Data binding with RPC methods in a stateless environment is a nice alternative to existing stateless web frameworks.

Easier multi-process development

There are many advantages to decoupling software components, as long as they can be efficiently managed independently. We've all read about Microservices and the success of businesses like Amazon which made decoupling of systems a major priority. But the downsides of a decoupled system are the design costs of choosing the pieces and the complexity of standing up a complete environment composed of many pieces. Developers need a quick and reliable workflow where they can change code, build, run, and test even when they are touching code in more than one process.

With StrataCode, you can build all processes required in a solution from the same stack of layers. The layer definition files are also the perfect place for the code that manages the deployment process by updating and restarting processes as necessary. So for any project, just running 'scc list-of-layers' should be the way any type of technical user can run a specific program. It should not matter how many processes or environments are required to implement that operation.
Layers can handle all of the details - including all of the configuration specific to this user's environment. Because StrataCode itself is built on code-generation, supports maven, multiple processes, dependencies and more customizing it's a better tool for devops too. They can manage production layers which keep their environments separate from development, let them customize more aspects of the system, avoid copies, and use static typing for better code management of their assets.

See these examples of basic applications built using synchronization: UnitConverter, TodoList, and Program Editor.

Client/server synchronization using layers

Today client/server programmers commonly use remote procedure calls (RPC) with a serialization protocol like JSON to communicate with the server. Some frameworks require that you write code to convert objects to and from JSON, others handle that transparently, or declaratively often via code-generation. The Sync framework also supports RPC and object serialization using code generation. You can add the @Sync annotation or call SyncManager.addSyncType with the property metadata.

The biggest hassle with direct RPC code is that you have to handle the response asynchronously (at least when you are using an asynchronous runtime like the browser). Data binding gives you an option to avoid response listeners - use bind to the remote result, or trigger the remote method in a reverse-only binding. When multiple changes fire at the same time, often due to chained bindings or methods run from a binding, the stream of change events is sent to the server and the stream of results are returned. A set of properties are updated and the UI is updated in response to those changes.

When you use this model, you have declarative/functional access to data you retrieve asynchronously, and can cleanly invoke remote methods in response to events even when they are async. The same data binding expressions will work in either a client/server environment, or local which makes it easier to reuse domain model code, even when the domain model has remote methods. This is particularly valuable when a business user is editing data-binding expressions that control some combination of client and server functionality.

By it's nature, RPC involves copying state back and forth explicitly. Programmers have to write that code explicitly and spend a lot of time with tricky code that amounts to reimplementing the same patterns over and over again. What you are really doing here is synchronizing objects, or parts of those objects, between the client and server. StrataCode offers a model for doing just that - synchronize objects automatically using declarative patterns.

You annotate classes, objects and properties with the @Sync annotation, or put them in a shared layer and annotate the layer itself with @Sync. If you set @Sync on a layer, StrataCode detects which layers overlap between different processes and generates code to synchronize just the overlapping classes, objects, and properties back and forth.

When a type is marked with @Sync, calls are inserted into the code to track instance creation, and property changes. Those events are recorded and sent to the client or server on the next sync (i.e. at the end of the request, or at the end of some user-interface interaction).

The synchronization framework lets you build rich, interactive apps, with complex graph-based data models with little to no remote procedure calls, no data transfer objects, and with a statically typed protocol - for efficient errors and easily versioned protocols. The declarative patterns implemented by the sync framework support the most common uses cases - create, update, delete, lazily fetching of references and collections, and more. When your application needs more control, you can set the @Sync annotation on properties or types to fine-tune the behavior. If you still need more control, you can fall back to RPC on the same objects. Or start out with an RPC call, and use synchronization to apply or receive changes to the results. When you call a remote method using already-synchronized objects as parameters, those args are passed by reference making it much easier to build an API which works both locally or remotely.

The protocol between client and server is designed to be readable and organized - so when you change "firstName" and "lastName" of a User object, it comes across as two property changes of the User type. Serializers are implemented in JSON and SCN (a StrataCode layer which is converted to JS before being shipped to the client)

Read more in the documentation.