EnqueueZero Techshack 2019-04

Kia ora! I am the creator of Enqueue Zero (https://enqueuezero.com), a site that explains code principles to Developers/DevOps/Sysadmins. This site is updated under a single man team since 2018. The `EnqueueZero Techshack` series is comprised of a bunch of interesting posts I wrote or found on the Internet. More importantly, I hope these posts would please you.

Please follow Twitter account @enqueuezero for the updates. Happy exploring!

Canary analysis: Lessons learned and best practices from Google and Waze

cloud.google.com

Investing in canary deployment will greatly increase your confidence in your deployment processes, lower the number of problems that impact your users, increase your velocity, and hopefully lower your stress level! Below are the things to care:

  • Pipeline: configuration -> set latest version -> bake GCP (allocate cloud resource) -> canary analysis -> manual judgement -> send production update -> deploy red-black
  • Prerequisite of canary deployment:
    • an external monitoring system to provide the data for canary analysis.
    • build SLOs to measure the health of the services.
  • Canary best practice:
    • Compare the canary against a baseline, not against production
    • Run the canary for enough time
    • Carefully choose which metrics to analyze
    • Create a standard set of reusable canary configs

The Night of A Cascading Failure

rachelbythebay.com

What a tragedy! Someone tripped over a bug, submitting code that can yield a negative number calculated for a size_t type. Unfortunately, when the real case happens, the entire system went down, including the bastion host. Seriously, almost NO ONE can log in to the data center and fix the issue.

Lesson Learned:

  • Rather than string splitting, it's better to use Regex for matching what you exactly need.
  • Rather than having the whole world die, let your service alive even if the dependencies are unavailable.
  • Think about the possibility of failure injection.
  • Be kind to your coworkers. They didn't write the bug. They just tripped it.

Scaling Engineering Teams via Writing Things Down and Sharing - aka RFCs

blog.pragmaticengineer

Check out the approach if you happen to have below problems:

  • The lack of visibility on others building or having built the same thing as my team.
  • The tech and architecture debt accumulated due to different teams building things very differently, both approach-wise and quality-wise.

The solution is by taking below steps:

  • Do planning before building something new.
  • Capture this plan in a short, written document.
  • Have a few, select people approve this plan before starting work.
  • Send this planning document out to all engineers in the company.
  • Have everyone follow the above steps.

The approach encourages reviewing and spreading knowledge across the organization. Besides, If everyone agrees on how the project should be done then writing the approach down should be a piece of cake.

Below is a potential template for two different types of applications.

  • Backend
    • List of approvers
    • Abstract (what is the project about?)
    • Architecture changes
    • Service SLAs
    • Service dependencies
    • Load & performance testing
    • Multi data-center concerns
    • Security considerations
    • Testing & rollout
    • Metrics & monitoring
    • Customer support considerations
  • Mobile/Web
    • List of approvers
    • Abstract (what is the project about?)
    • UI & UX
    • Architecture changes
    • Network interactions detailed
    • Library dependencies
    • Security concerns
    • Testing & rollout
    • Analytics
    • Customer support considerations
    • Accessiblity

HTTP/3 explained

http3-explained.haxx.se

  • Why QUIC? No HoL, Reduced latency, Always secure, Combating ossification, No 3-way handshake.
  • gQUIC != IETF-QUIC.
  • HTTP/3 = HTTP-over-QUIC.
  • Features:
    • QUIC is over UDP.
    • QUIC adds layer atop UDP to introduce reliability, including re-transmissions of packets, congestion control, pacing and the other features otherwise present in TCP.
    • QUIC features separate logical streams within the physical connections.
    • A QUIC connection is made to a UDP port and IP address, but once established the connection is associated by its "connection ID".
    • QUIC guarantees in-order delivery of streams, but not between streams.
    • QUIC offers both 0-RTT and 1-RTT connection setups.

Functional Programming Fundamentals

www.matthewgerstman.com

Functional programming (often abbreviated FP) is the process of building software by composing pure functions. A pure function is a function which given the same inputs, always returns the same output, and has no side-effects.

You should not mutate the input. Otherwise, the result is predictable.

Write declarative code, instead of imperative code. For example, lodash.keyBy(files, 'id'); is better than a for-loop.

If you try to perform effects and logic at the same time, you may create hidden side effects which cause bugs in the logic. Keep functions small. Do one thing at a time, and do it well.

Plan for composition. Write functions whose outputs will naturally work as inputs to many other functions. Keep function signatures as simple as possible.

Immutability. The true constant is in change. Mutation hides change. Hidden change manifests chaos. Therefore, the wise embrace history.

Memoization is an optimization technique used primarily to speed up computer programs by storing the results of expensive function calls and returning the cached result when the same inputs occur again.

Use high order functions:

  • Function that takes a function
  • Function that returns a function
  • Function that takes a function and returns a function
  • Currying
  • Partial Application

The Elements of UI Engineering

overreacted.io

The author listed out a few interesting problems for UI Engineering.

  • Consistency. How do we keep the same data in sync on different parts of the screen? How and when do we make the local data consistent with the server, and the other way around?
  • Responsiveness. People can only tolerate a lack of visual feedback to their actions for a limited time. How do we keep our apps responsive to different kinds of inputs?
  • Latency. Both computations and network access take time. How do we gracefully handle latency without displaying a “cascade” of spinners or empty “holes”? How do we avoid “jumpy” layout? And how do we change async dependencies without “rewiring” our code every time?
  • Navigation. We expect that the UI remains “stable” as we interact with it. How do we architect our app to handle arbitrary navigation without losing important context?
  • Staleness. How do we handle cache data properly?
  • Entropy. How do we tame the combinatorial explosion of possible states and make visual output predictable?
  • Priority. How do we get independent widgets to cooperate instead of fighting for resources?
  • Accessibility. What can we do to make accessibility a default rather than an afterthought?
  • Internationalization. How do we support different languages without sacrificing latency and responsiveness?
  • Delivery. How do we choose at which point to introduce latency? How do we optimize our delivery based on the usage patterns? What kind of data would we need for an optimal solution?
  • Resilience. How do we write code in a way that isolates rendering and fetching failures and keeps the rest of the app running? What does fault tolerance mean for user interfaces?
  • Abstraction. How do we create abstractions that hide implementation details of a particular UI part? How do we avoid re-introducing the same problems that we just solved as our app grows?

Think of a non-trivial UI element from an app you enjoy using, and go through this list of problems. Can you describe some of the tradeoffs chosen by its developers? Try to recreate a similar behavior from scratch!

Looking Back at Postgres

arxiv.org

It is a recollection of the UC Berkeley Postgres project. It's a fun paper that introduces the history of Postgres.

Lesson learned:

  • Design your software for extensibility. Sometimes, a successful first system failed to follow up a second system, but it didn't happen on Postgres. With extensibility as an architectural core, it is possible to be creative and stop worrying so much about discipline: you can try many extensions and let the strong succeed.
  • One size fits many.

Paper review. Serverless computing: One step forward, two steps back

muratbuffalo.blogspot.com

The shortcomings of current FaaS offerings:

  • Limited Lifetimes. After 15 minutes, function invocations are shut down.
  • I/O Bottlenecks. Lambdas connect to cloud services—notably, shared storage—across a network interface. The bandwidth is quite small.
  • Communication Through Slow Storage.
  • No Specialized Hardware.

Current FaaS offerings are making a trade-off between efficiency and easy-to-program pay-only-per-invocation autoscaling functionality. FaaS is all about optimizing developer productivity and time-to-market. A good strategy is to make the prototype on Lambda and then "make it right".

Noria: dynamic, partially-stateful data-flow for high-performance web applications

usenix.org

Noria is a partially-stateful data-flow model. The model makes data-flow viable for building long-lived, low-latency applications, such as web applications. It provides better performance than a typical MySQL/memcached stack. It supplies a relational schema, a set of parameterized queries, and sharing states across related queries.

Keeping CALM: When Distributed Consistency is Easy

arxiv.org

Consistency as Logical Monotonicity (CALM). A program has a consistent, coordination-free distributed implementation if and only if it is monotonic. A CALM system is a NoCo system, “No Coordination.” The CALM is to about avoid coordination when designing high performing scalable distributed systems.

  • essential coordination, a guarantee that cannot be provided without coordinating
  • accidental coordination, coordination that could have been avoided with a more careful design.

A key concern in modern distributed systems is to avoid the cost of coordination while maintaining consistent semantics. CALM is an acronym for "consistency as logical monotonicity".

Distributed systems theory is dominated by fearsome negative results, such as the Fischer/Lynch/Patterson impossibility proof, the CAP Theorem, and the two generals problem.

CALM shows that monotonicity, a property of a program, implies consistency, a property of the output of any execution of that program. Though CLAM is only a constructive result. System builder still needs to answer two questions:

  • Is whether the problem they are trying to solve has a monotonic specification?
  • Given a monotonic specification for a problem, how can I implement it in practice?

Would you still pick Elixir in 2019?

github.com | news.ycombinator.com

Pros: easy to learn, beautifully designed language, macros, baked-in test, BEAM VM, error handling, incoming aws lambda support, etc. Cons: few human resources on the market.

Resiliency: Rate Limiting, Retries, Bulkheads

medium.com (rate limiting) | medium.com retries (retries) | medium.com (Bulkheads)

Rate limiting can mitigate API resource exhaustion from a greedy client.

Retries are a powerful resilience primitive which allows a client to offer higher availability than its dependencies. Retries should be used with care since an unrefined retry policy could cause result in a denial of service-like attack.

Bulkheads are an effective way to bound resource usage. They ensure isolation and help to mitigate cascading failures through preventing overload. Bulkheads are not a silver bullet but one of many primitive patterns for creating resilient highly available applications.

What Are The Best Software Engineering Principles?

dev.to

  • Measure twice and cut once
  • Don’t Repeat Yourself
  • Occam’s Razor
  • Keep It Simple Stupid
  • You Aren’t Gonna Need It
  • Big Design Up Front
  • Avoid Premature Optimization
  • Principle Of Least Astonishment
  • S.O.L.I.D.
  • Law of Demeter

The Internals of PostgreSQL, for database administrators and system developers

www.interdb.jp

In this document, the internals of PostgreSQL for database administrators and system developers are described.

MySQL High Availability Framework Explained – Part II

highscalability.com

The post discusses the details of MySQL semisynchronous replication and the related configuration settings that help us ensure redundancy and consistency of the data in our HA setup.

Semisynchronous replication: the master commits transactions to the storage engine only after receiving an acknowledgment from at least one of the slaves.

Learn Enough Docker to be Useful - Part I

towardsdatascience.com

A Docker container holds things, is portable, has clear interfaces for access, and can be obtained from a remote location .

Operable Software

ferd.ca

The post covers views on simplicity and complexity, how people actually approach their systems and form mental models of them, and how we should instead structure things if we want to make systems both observable and operable.

  • A simple system can work (but is not guaranteed to do so)
  • A complex system can work, usually if it has been grown from a small system. Trying to ship a complex system the first time around is usually not going to go great
  • Complexity is unavoidable as your system scales; If people rely on a system for something serious, it will inevitably become complex
  • A simple system behaves differently from a large system, even if both provide the same functionality

To make a software operable, it's inevitable to add more components. More importantly, don't forget the people who are interpreting the information of these components.

Since people in the system are always the last hope of resolving a problem when tooling and automation fails, we should put more effort into supporting them, in providing a good operator experience.