I made a DSL

and I liked

Email Security 101

You can set a DNS record that sends you an email report when people send mail as that domain.

DNS Record

$ dig +noall +answer _dmarc.robertroskam.com txt

output

v=DMARC1; 
p=quarantine; 
rua=mailto:0y8hka6m@ag.dmarcian.com

let's dig my domain's dmarc record

DMARC Report

<?xml version="1.0" ?>
<feedback>
  <report_metadata>
    <org_name>google.com</org_name>
    <email>noreply-dmarc-support@google.com</email>
    <extra_contact_info>https://support.google.com/a/answer/2466580</extra_contact_info>
    <report_id>3766706526427983302</report_id>
    <date_range>
      <begin>1626220800</begin>
      <end>1626307199</end>
    </date_range>
  </report_metadata>
  <policy_published>
    <domain>robertroskam.com</domain>
    <adkim>r</adkim>
    <aspf>r</aspf>
    <p>quarantine</p>
    <sp>quarantine</sp>
    <pct>100</pct>
  </policy_published>
  <record>
    <row>
      <source_ip>209.85.220.41</source_ip>
      <count>4</count>
      <policy_evaluated>
        <disposition>quarantine</disposition>
        <dkim>fail</dkim>
        <spf>fail</spf>
      </policy_evaluated>
    </row>
    <identifiers>
      <header_from>robertroskam.com</header_from>
    </identifiers>
    <auth_results>
      <spf>
        <domain>robertroskam.com</domain>
        <result>softfail</result>
      </spf>
    </auth_results>
  </record>
</feedback>

Observations

  • IP Addresses aren't human readable
  • Server names from reverse IP might not be meaningful
  • You need another layer to group and label the traffic

Enter Dmarcian.com

  • Over 1k rules
  • Based on >1 trillion records
  • Written by professional analysts

Rule Examples

Label: Salesforce - ID: 105

ip_in_netblocks(ip, ['67.228.34.32/27', '52.128.40.0/21']) or 
regex(ptr_org, 'emsend[1-8].com')

Label: Active Campaign - ID: 107

asn == 22606
ptr_org in ('hubspot.com', 'hubspotemail.net')

Label: Hubspot - ID: 106

Human-Readable Output

The Motivation

Give our end users the same tools as our analysts.

Why?

Sometimes our users know better than us what particular traffic actually is

ip == '209.85.220.41' and ptr_org == 'robertroskam.com'

Label: Roskam's Home - ID: 108

So just expose the internal authoring system, right?

All rules were written in python and just evaled in production.

NO!

What we like

  • Features: we had a very robust feature set we gave to our analysts. It has allowed them to be very productive. Our analysts went from 1k rules to 1.5k rules in about 6-8 months after our new authoring tool. The first  1k rules took 7 years to write.
  • Performance: entire 1.5k+ ruleset takes 1-2ms per each item of traffic. We process ~10 million records of traffic each day and that rate doubles every 4-8 months. 

What we don't like

  • Security: users could be malicious with pure Python
  • Feedback: the existing approach did some inspection on the incoming rule, but was limited in giving feedback if the code was an invalid Python AST
  • Global rules: we had no means to separate the rules on a per-account basis, because our rules authoring engine was centralized

Requirements

  • Prevent injection attacks
  • Give feedback to users during authoring
  • Enable per-account not just global authoring
  • Support existing features in 1.5k+ rules
  • Have the performance be no worse at runtime

Authoring

  • Prevent injection attacks

  • Give feedback to users

Areas of Concern

Runtime

  • Enable per-account authoring
  • Support existing features in 1.5k+ rules
  • Have the performance be no worse

Authoring Requirements

  • Prevent injection attacks

  • Give feedback to users

Injection Attacks

locals()

import

:=

match

 block these?

allow these?

Is it safer to....

==

!=
in

Authoring Requirements

  • Prevent injection attacks. Choices are:

    • Block

    • Allow

  • Give feedback to users

What kind of feedback?

Syntax

Semantics

abc == 123
asn == 1234 and
asn == 123 and ( ptr_org == 'foo.com' or h_from == 'm.foo.com'
asn == '123'
ptr_org in asn
regex(ptr_org, 123)

Authoring Requirements

  • Prevent injection attacks. Choices are:

    • Block

    • Allow

  • Give feedback to users:

    • ​syntax

    • semantics

Does this already exist?

Nope

Time to make a DSL

Phases

  • Plan: Determine the grammar

  • Implement: Lexer, Parser, Compiler

  • Deploy: Integrate into Existing Authoring System as beta

Plan

  • Problem #1: what all we use
  • Problem #2: DSL syntax 

Problem #1: determine what all we use

  • Historically, "just use Python" was the spec for analysts, we had no idea what was being used
  • We wrote a script using Python's AST library and just dumped out every token found across our existing rules.
  • That's when the surprises rolled in:
    • Just comparisons ops ==, != but not < or >
    • We had both lists and tuples being used
    • Certain variables injected but completely unused

Problem #2: DSL syntax

  • Since we could make the syntax different, should we?
    • using AND/OR instead of and/or
    • or using = instead of ==
  • For the initial pass, we did decided to support a subset of Python's language spec, and not limit ourselves to that permanently. 
  • We decided to reject doing both lists [,] and tuples (,), and just have one symbol set and we went with lists using [,]
  • As a subset of Python's syntax, we didn't have to immediately rewrite the runtime in production to use our new compiler. So this was a win from a deployment compatibility.

Implement

  • Made as an internal library; separate git repo
  • Tokenizer
    • it knows about exactly which variables to expect
    • It can reject anything it doesn't expect
  • Parser
    • LR hand parser, doing depth first
    • The parser does the heavy lifting doing syntax checking for boolean logic and parenthesis and semantic checking
    • We even check the argument types for each function we have and their outputs
  • Compiler
    • Targets Python for this initial build and simply unrolls the AST from a map for each symbol to python
  • ~70 tests (pytest + parametrize); gitlab ci: tests, black, flake8

Deployment

  • Rewrote all existing rules that used tuples to using lists 
  • Integrated it into our internal authoring tool fairly simply. Before committing to the db, we check each one. That was deployed after we wrote all the rules.
    • Had briefly considered storing the AST instead of the DSL, but decided against it because it locked us into particular AST representations.
  • Once the runtime was rewritten to support multiple accounts, we implemented shortly after that the new compiler. It was basically a drop in replacement and added two lines.

Final Thoughts

Things I regretted

  • Not implementing a type system
  • Not making the DSL re-usable to other domains
  • Not treating all the operators such as == and in like functions instead of different syntax. (Their visitor functions are discrete.)
  • Trying a functional only approach with the parser. Found how much state being passed around a bit too much.

Things I liked about the approach

  • Relatively easy to debug weird rule behavior in production
  • Authoring for existing users and customers is extremely straightforward
  • Leaves us open to expansion into new ideas
  • Enables the feedback

Future Opportunities

  • Rule overlap checking: as it stands, we don't know if two authors write substantially similar rules
  • Implementing a client-side syntax and semantic-aware auto-complete/intellisense
  • Making a library for implementing DSLs so that we can reuse this approach in others areas of the business

Stakeholders Involved

  • Principal Engineer for Core Product
  • Director of Engineering
  • Lead Analyst
  • Director of Deployment Services
  • VP of Product, later C-Level

Client-Driven WYSIWYG

Example of client side builder for conditions. Src: sentry.io

I made a DSL

By Robert Roskam

I made a DSL

  • 136