Skytable Octave is here

December 5, 2023
By Sayan Nandan

After our 18-month hiatus from major releases, we’re extremely happy to announce that Skytable Octave (0.8.0) is out today. We’re releasing it as a public beta today although it is already stable and will ship the final GA release in a week or two. There’s a lot packed in this release, so let’s dive right in!

A new world, a new data model

Skytable Octave is a new database, even for Skytable users who have been along with us since 2020. After a lot of feedback from our community, we realized the limitation with our key/value engine. We realized that our users need a more powerful querying interface. So, we went back to the drawing board and re-imagined everything from the ground up. Let’s walk through this new world.

For the past three years, Skytable has only been a key/value store with GET and SET for data and other commands for data types, but now Skytable supports structured and semi-structured data and we’re working on supporting completely unstructured data. You can now create fields, use nested collections in columns, use counters, append to strings, collections, do arithmetic and more.

Meet spaces and models

A space is a collection of models, settings, properties, permissions and collections. It’s semantically like a SQL database but significantly different otherwise. A model is a definition of a data model that is semantically like a SQL table but has richer data types, more collections and advanced properties. But before that you should learn about our new query language, BlueQL.

Meet BlueQL

BlueQL is Skytable’s own query language, designed from the ground up as a powerful, modern and secure alternative to SQL. With Skytable’s database engine and BlueQL we aim to give you the best of SQL with some of our design improvements.

To start with BlueQL is extremely hardened against injection attacks (see below) and its design deters injection attacks at development time. This includes advanced expression sanity enforcement, single statement per query, and mandatory parameterization for variable data.

Here are some sample queries for you to run using the skysh REPL. Note that you’re able to freely use data types here because the REPL parameterizes them behind the scenes:

CREATE SPACE IF NOT EXISTS myspace
CREATE MODEL IF NOT EXISTS myspace.mymodel(
  username: string,
  password: binary,
  notes: list { type: string },
  followers: uint8,
)
USE myspace
INSERT INTO mymodel {
  username: "john appleseed",
  password: "mypass",
  notes: ["my first note"],
  followers: 100,
}
UPDATE mymodel SET notes += "second note", followers += 10 WHERE username = "sayan"
SELECT username, password FROM mymodel WHERE username = "sayan"

You can start learning BlueQL here.

That looks just like SQL?

That’s a great question and we’re glad that you think so! The first thing to remember is that Skytable at the core is a NoSQL engine and hence it’s internals will be quite different from relational systems. There are a lot of differences. For example:

All DML queries are point queries and not range queries:
- This means that they will either return atleast one row or error
- If you intend to do a multi-row query, then it won’t work unless you add ALL. ALL by itself indicates that you’re applying (or selecting) a large range and can be inefficient
Multi-row DML queries are slow and inefficient and are discouraged (you can visualize it as rows being on physically different nodes and the delay involved in applying changes to each such node)
You can only query on the primary index, once again because of speed (and the problem with scaling multiple indexes) with a fixed set of operators
Remember, in NoSQL systems we denormalize. Hence, no JOINs or foreign keys as with many other NoSQL systems
A different transactional design:
- All DDL and DCL queries are ACID transactions
- However, DML transactions are not ACID and instead are efficiently batched and automatically made part of a batch transaction. The engine decides when it will execute them, for example based on the pressure on cache. That’s because our focus is to maximize throughput
- All these differences mean that DDL and DCL transactions are ACID transactions while DML queries are ACI and eventually D (we call it a delayed durability transaction). This delay however can be adjusted to make sure that the DML queries emulate ACID transactions but that defeats the point of the eventually durable system which aims to heavily increase throughput.

These along with many more differences make Skytable fundamentally different from SQL systems, and even many other NoSQL systems. We strongly recommend that you read more about architecture here (WIP).

BlueQL’s efforts at being explicit

A chief goal of BlueQL is to be explicit about the intent of the query, even if it might seem unnecessary. For example, here’s a trivial comparison:

SQL	BlueQL	Notes
`DROP TABLE users`	`DROP MODEL ALLOW NOT EMPTY users`	The drop command has been traditionally associated with the removal of a table and it’s data. BlueQL’s `ALLOW NOT EMPTY` explicitly notes that you’re removing the model and it’s associated data and not just the model
`DROP DATABASE myapp`	`DROP SPACE ALLOW NOT EMPTY myapp`	Yet again, BlueQL explicitly notes that the space and it’s associated data is being removed
`SELECT * FROM mytbl`	`SELECT ALL * FROM mytbl LIMIT <cnt>`	Yet again, BlueQL explicitly notes that we intend to select multiple rows and not just one row

Similarly, the INSERT, SELECT, UPDATE and DELETE can modify atmost one row, once again satisfying the point query property.

These are just a few syntactical differences. The internal working of BlueQL is fundamentally different from how SQL is evaluated and executed.

Injection attacks

As we noted earlier, BlueQL is hardened against injection attacks. Injection was solved a long time ago but was never enforced. Matter of fact, per OWASP it still remains at #3 of the Top 10 most exploited vulnerabilities¹!

So how do we do it? We enforce strict standards at the language level, most prominently: mandatory parameterization. A very trivial example:

This would work in SQL and can be escaped immediately providing a surface for an injection attack:
```
q = "SELECT * FROM users WHERE username = '" + unsanitized_input + "'"
```
But it wouldn’t even as much as compile in BlueQL due to it’s mandatory parameterization requirements.

There are more techniques used such as expression contexts and statement positions — but we’ll leave that for a later post.

Note that this is a release blog post and is not intended to be a technical overview of BlueQL’s technical design. A more technical description of how BlueQL can potentially prevent several common injection attacks is being worked on and will be shared with the community soon.
Also, while obviously implied but not explicitly said, no one can prevent attacks and our goal is to reduce the surface that allows such mistakes to be made.

So who is Skytable for?

Skytable’s target audience includes:

People who currently use a NoSQL database but want:
- A more familiar querying experience
- Better performance²
- People who currently use a key/value store but want more powerful querying and manipulation and still get better performance (even with BlueQL)
People who currently use a SQL database and:
- Can benefit from NoSQL’s general peformance improvements resulting from non-relational data³
- Want to use complex collections (lists, dictionaries, JSON, and custom types)
- Do not have too many relational requirements (JOINs, dependence on cross-referencing with foreign keys, etc.)
- Prefer throughput over consistency (usually applications which are like large social networks, streaming platforms, etc.)
Individuals who have concerns about security in critical environments (people will make mistakes but we’ll make the bar higher for making such mistakes)
And of course, people who want to see innovation in database systems

Our philosophy is:

If it’s fast, it’s worth it. If it scales, it makes sense. If it’s bad, don’t let them do it easily and if they want to do it, let them be explicit about it.

This is just the start. There’s more to come.

We’re just getting started. As our vision takes shape, you can expect many more collections to be shipped, PUB/SUB like components to be released and advanced collections DML to be incorporated into BlueQL.

Here are some links to get you started:

GitHub: get our source code on GitHub
Docs: read installation instructions here
Discord: Need help? Join our Discord Community here

See you at the next release. I won’t take your time! (While we get up to speed with our clustering and replication implementation that we’re going to ship in Q1 2024).

What you need to do right now is head over to our documentation and get started. Take Skytable for a ride, or … a flight!

See OWASP’s official text here ↩︎
Official detailed benchmarks including throughput, resource usage and more will be published soon. ↩︎
Performance varies from case to case, but generally if you don’t use SQL’s relations you will likely be able to achieve greater throughput ↩︎

Last updated on: December 15, 2023

6 min read

Menu