Knative OSS Diaries – week #18
Another super exciting week went by, with the Knative 1.0 GA release and Devoxx UK happening here in London. To my surprise, there were a lot of Knative mentions at the conference and even func
had a session. I totally recommend you to check out the following Knative related sessions:
- Next Generation Microservices using Knative and `func`
- Cloud Native with Spring Boot and Kubernetes
- Crossing the Platform Gap
Following the 1.0 GA release, there were a bunch of blog posts that are worth reading regarnding the release, the history of the project and commercial offerings around it:
Now that the release is out, I've started playing with Spring Native and Knative func
and I will building a small demo understand how all the pieces fit together to make sure that developers looking into the project have a simple and awesome experience. I am excited about this project because it should simplify the development experience of function-based workloads hiding the complexity of Kubernetes, Docker and even Knative.
Also because I like to have good read before going to sleep, I got a couple of books at Devoxx from authors that I consider my friends: Practical Process Automation from Bernd Ruecker and Knative Cookbook by Burr Sutter.
Also recomended by @PaulaLKennedy a more serious reading about a topic that I am getting more insterested lately: Team Topologies by Matthew Skelton and Manuel Pais
Notes from Devoxx UK
- dbt -> Centralised meta data stores (experience from autotrader.co.uk)
- Datahub (from LinkedIn?)
- Amundsen (from lift)
- marquez (from wework)
- Google catalog ?
- Open-medatadata.org
- open-lineage.io
- Apache Kafka vs Apache Pulsar
- Producer stream-> put data in a place so other applications can consume
- Kafka and Pulsar store the data locally (as a database)
- Consumer can also ask for all the data that Kafka or Pulsar (reply log, back in time)
- This is not easy to do with message systems
- Kafka and Pulsar can isolate consumers, so if consumers have problems, they don’t bother to each other
- Kafka (from LinkedIn)is much more mature than Pulsar, Pulsar (yahoo) is just starting (https://pulsar.apache.org) designed to solve Kafka problems
- The adoption of Pulsar is much slower
- Pulsar is being adopted by large companies but not as Kafka
- Alternatives: Redmonk, nats.io
- Differences
- Kafka
- Limitations:
- Kafka is HA system, it runs in more than one machine (Kafka nodes/ brokers)
- Producer sends data to all brokers (3), so if one broker fail others can work
- Kafka uses Zookeeper?? Still? It is used to decide which broker is the leader
- Each broker has its own storage (of course) Kafka has partitions (in a file) Kafka write messages in the order that they arrive
- Partitions are the ways of scaling for scaling, more partitions more throughput, if we have more consumers than partitions, the consumer will not be able to read from Kafka (in parallel I guess)
- It is key to understand how to define the number of partitions, not easy to decrease the number of partitions
- Too many partitions is too costly, more partitions means more problems, more coordination in the consumer and broker side
- Scale up and down the number of brokers is hard because, new brokers will be empty, so you need to populate data to these new empty brokers (you need to do this)
- Brokers needs to replicate data so if the broker brakes consumers can go to another broker
- This increase a lot the traffic in the cluster
- Limitations:
- Pulsar
- Same concept of Broker, same as zookeeper
- Brokers doesn’t keep the data locally, they send the data to a book keeper (bookie)
- Brokers send data to all book keepers, this basically removes the need for partitions
- Scaling storage is much easier with Pulsar because book keepers are easy to scale
- Brokers are stateless
- Consumers are also easy to scale
- Kafka already have a version without zookeeper
- Pulsar is more Kubernetes friendly, provides more features to scale up and down in Kube
- More consuming pattern
- In Kafka patterns:
- Shared and key-shared
- In Pulsar
- Exclusive (just only one consumer)
- Failover (only one consumer, if the consumer fail, then another consumer can take its place)
- Shared and Key-shared
- Tired Storage and geo replication
- Kafka
- Geo replication features are only comercial
- tiered storage (comercial)
- Pulsar
- geo replication (replicate data across different clusters)
- namespaces (isolation, similar to Kubernetes namespaces, to organise data and the traffic)
- tiered storage (for cloud providers)
- Kafka
- Message Size
- Kafka optimum message size is 1 KB, until 1MB, more than 1MB it sucks, and doesn’t work by default
- Performance reasons
- Pulsar, 5 MB limit
- Performacne-wise are super close, so there is no reason to choose one or the other based on performance
- There is a proxy for bridging from Kafka to Pulsar
- Kafka
- Producer stream-> put data in a place so other applications can consume
- Crossing the Platform Gap by Laura (syntasso):
- Platform teams and application teams
- Collaboration .. shared responsibility,
- “As a service” boundaries
- “Continue with lightweight collaboration to validate”
- XaaS: deliniate responsabilities
- Platform as a Product
- https://github.com/syntasso/kratix
- Stream and SQL @morsapaes (link to the blog post explaining how materialize works)
- Materialize is pretty cool, as it allows you to stream data to it and then ask questions about it. It does create aggregations and materialized views in memory for efficient querying
- OLTP -> OLAP (off load analytics workflows to a second DB)
- OLTP -> row based
- OLAP -> Columnar based (attribute oriented way)
- OLVM -> Push based (online view maintenance over streaming data) Pre compute queries
- Metabase dashboards
- You create queries in materialise for things that you want to get answers for in a repeated way on top of online data
- Timely data flow write in RUST
- Schemas for events, structured data in Kafka
- Async api for describing your event producers and consumers https://www.asyncapi.com
- registries:
- Apicurio registry: github.com/apicurio/apicurio-registry
- Kafka, posrgresSQL
- Formats: JSON, AVRO, PROTOBUF