EnqueueZero Techshack 2019-01

Kia ora! I am the creator of Enqueue Zero (https://enqueuezero.com), a site that explains code principles to Developers/DevOps/Sysadmins. This site is updated under a single man team since 2018. The `EnqueueZero Techshack` series is comprised of a bunch of interesting posts I wrote or found on the Internet. More importantly, I hope these posts would please you.

To keep me motivated and this website ad-free, appreciate your kind donation!

Donate this project using Patreon

Please follow Twitter account @enqueuezero for the updates. Happy exploring!

Kubernetes In a Nutshell

enqueuezero.com

Kubernetes is a system for running and coordinating containerized applications. These applications are deployed across a cluster of machines.

As a Kubernetes user, you define how the application should run. You also define how the application should interact with other applications or the outside world.

Kubernetes brings together machine servers into a cluster using a shared network.

  • Master server(s) controls the entire cluster.
  • Nodes are the other servers in the cluster.

How Discord Stores Billions of Messages

blog.discordapp.com

The underlying database is Cassandra. Reasons for choosing Cassandra was:

  • Read/Write ratio is about 50/50.
  • Many random reads.
  • Large dataset.
  • Uneven amount of data for Private text chat massive Discord servers and Large public Discord servers.

Requirements were:

  • Linear scalability.
  • Automatic failover.
  • Low maintenance.
  • Proven to work.
  • Predictable performance.
  • Not a blob store.
  • Open source.

The data Modeling is to bucket messages by time. The primary key for Message is ((channel_id, bucket), message_id).

CREATE TABLE messages (
   channel_id bigint,
   bucket int,
   message_id bigint,
   author_id bigint,
   content text,
   PRIMARY KEY ((channel_id, bucket), message_id)
) WITH CLUSTERING ORDER BY (message_id DESC);

Cassandra is an AP database, meaning it trades Consistency with Availability. To solve the dilemma, engineers decided to delete records in the database if the fault detected. The delete action is performed by a form of write called a “tombstone.” On read, it just skips over tombstones it comes across. The retention policy for tombstone is two days by default.

Everything You Need To Know About Networking On AWS

dev.to

      +-------+
                                        | ig-1  |
                                        |       |
        vpc-123: 10.0.0.0/16  |         |       |        |
       +----------------------+---------+-------+--------+---------------------+
       |                      |                          |                     |
       |  +-----+             |  +-----+                 |  +-----+            |
       |  | NAT |             |  | NAT |                 |  | NAT |            |
public |  |     |             |  |     |                 |  |     |            |
subnets|  +-----+             |  +-----+                 |  +-----+            |
       |                      |                          |                     |
       |                      |                          |                     |
       |                      |                          |                     |
       |              +-------+                  +-------+             +-------+
       |              | rt-1a |                  | rt-1b |             | rt-1c |
       | 10.0.1.0/24  |       | 10.0.2.0/24      |       | 10.0.3.0/24 |       |
-------+-----------------------------------------------------------------------+
       | 10.0.4.0/24  | rt-2a | 10.0.5.0/24      | rt-2b | 10.0.6.0/24 | rt-2c |
       |              |       |                  |       |             |       |
       |              +-------+                  +-------+             +-------+
private|                      |                          |                     |
subnets|                      |                          |                     |
       |                      |                          |                     |
       |                      |                          |                     |
       |                      |                          |                     |
       |                      |                          |                     |
       |                      |                          |                     |
       |                      |                          |                     |
       +----------------------+--------------------------+---------------------+
       |         AZ 1         |          AZ 2            |        AZ 3         |
  • A virtual private cloud or VPC is a private network space in which you can run your infrastructure.
    • It has an address space (CIDR range) which you choose e.g. 10.0.0.0/16. It determines how many IP addresses you can have.
  • In VPC, there are public subnets and private subnets, depending on whether traffic can reach them from outside the VPC (the Internet).
  • As long as one Availability Zone or AZ is alive, your service should be able to operate.
  • A routing table contains rules about how IP packets in the subnets can travel to different IP addresses.
  • Internet Gateways allows traffic to the outside world.
  • A NAT gateway is a device which sits in the public subnets, accepts any IP packets bound for the Internet coming from the private subnets, sends those packets on to their destination and then sends the returning packets back to the source.
  • A security groups can specify ingress (inbound) and egress (outbound) traffic rules, limiting them to some sources (inbound) and destinations (outbound).

An Idiot’s guide to Support vector machines (SVMs)

web.mit.edu

  • Support vectors are the data points that lie closest to the decision surface. They are the data points most difficult to classify.
  • Support Vector Machine

How do you document a tech project with comics?

jvns.ca

  • Comics without good information are useless.
  • Focus on concepts that don’t change.
  • Make a single graphic with a pretty small amount of information.
  • Make a more in-depth comic / illustration

High-Reliability Infrastructure Migrations

youtube.com

The topic is about how to improve the availability from 99% to 99.99%.

Below are solutions.

  • Understand the design.
  • Run gamedays.
  • Classify failures.
  • Have incidents only once.
  • Make incremental changes.
  • Always have a rollback plan.

It's okay to start not being an expert, but you need to become one!

Introducing Pipelines to Airbnb’s Deployment Process

medium.com

Check out this post on how Airbnb design the deployment process and minimize potential misuses.

  • Develop a specification for a configuration file that service owners can use to define their deployment procedures (i.e., pipeline).
  • Save pipeline configurations in a database.
  • Give the frontend a method of accessing this data.
  • Separate and order the targets correctly in the UI display.
  • Disable targets that shouldn’t be deployed to (i.e., it has dependencies that have not yet been deployed to).
  • Finally, implement the UI.

Learn eBPF Tracing: Tutorial and Examples

www.brendangregg.com | bcc | bpftrace | bcc Python guide

Check out Brendan Gregg's new blog post on the beginner's guide to eBPF.

  • What is eBPF? eBPF does to Linux what JavaScript V8 does to HTML. By leveraging eBPF, you can write mini-programs that run on events like disk I/O, which are run in a safe virtual machine in the kernel.
  • What are bcc and bpftrace? As no one writes V8 bytecode, not many people write eBPF programs directly. Instead, people write bcc and bpftrace.
  • Companies including Netflix and Facebook have bcc installed on all servers by default, and maybe you'll want to as well.
  • Check the bcc Beginner's Tutorial. Many more tracing tools exist
  • For intermediate gamers, you can switch from bcc to bpftrace, which has a high-level language that is much easier to learn.
  • An example of bpftrace one-liner:
$ bpftrace -e 'tracepoint:syscalls:sys_enter_open { printf("%d %s\n", pid, str(args->filename)); }'

Docker Internals

docker-saigon.github.io

A Deep Dive Into Docker For Engineers Interested In The Gritty Details.

  • What is a container?
    • Containers share the host kernel
    • Containers use the kernel ability to group processes for resource control
    • Containers ensure isolation through namespaces
    • Containers feel like lightweight VMs (lower footprint, faster), but are not Virtual Machines!
  • How do containers compare to Package Managers?
    • Package managers failed us due to shared libraries version differences causing dependency issues; packaging shared libraries in an image goes around that.
  • How do containers compare to Configuration Management?
    • It is still advisable to leverage such a provisioning tool to bootstrap the Docker infrastructure, letting the Container Runtime layer take care of the application layer once it is ready.
  • Why Docker?
    • Because Docker is currently the only ecosystem providing the full package of Image management, Resource Isolation, File System Isolation, Network Isolation, Change Management, Sharing, Process Management, Service Discovery (DNS since 1.10)
  • How? Kernel Namespaces. CGroups, iptables, Copy-On-Write UnionFS.
  • Container Runtimes: LXC, Systemd-nspawn, runC
  • How to hook into the various Docker components?

Everything curl

ec.haxx.se | github.com

Without a doubt, curl is a great software. The author of curl published a book "Everything curl," a 10/10 worth reading.

  • You can learn the code layout: root, src, lib, docs, etc.
  • It's using traditional C code-style: https://ec.haxx.se/sourcecode-style.html, two spaces over the tab, /* comment */, break the long lines if over 80 chars, Open brace on the same line, else on the following line, no space before parentheses.
  • Network simplified. Basic concepts are the machine, URL, DNS, TCP connection, IP & Port, TLS, transfer protocol.
  • The world is full of protocols, both old and new. The latest curl supports DICT, FILE, FTP, FTPS, GOPHER, HTTP, HTTPS, IMAP, IMAPS, LDAP, LDAPS, POP3, POP3S, RTMP, RTSP, SCP, SFTP, SMB, SMBS, SMTP, SMTPS, TELNET, TFTP.
    • Like software, protocol specifications are frequently updated and new protocol versions are created.
    • Protocols come and go. IETF and RFC make it stable.
  • Usages: curl -v https://enqueuezero.com.
  • How to HTTP with curl? https://ec.haxx.se/http.html.
  • If using curl as C library instead of a command-line utility, try libcurl. Simple by default, more on demand! https://ec.haxx.se/libcurl.html

GraphQL Server Design @ Medium

medium.com

This article explains how the structure of Medium's GraphQL server helped make the migration much smoother.

Some decisions were made:

They came up with a layered architecture. As discusses in this post https://enqueuezero.com/layered-architecture.html, "Each layer has it’s own responsibilities, and only interacts with the layer below it". Below is the proposed layers:

  • Schema. It is the form of the data will take when it gets sent to the clients.
  • Repositories. It “stores” the cleaned-up data that initially came from data sources. Traditionally, it's like a controller layer in MVC, massaging and shaping the data.
  • Fetchers. It's fetched by GraphQL server and for fetching data from multiple data sources

GraphQL server