Hexagonal Architecture, differently

An explanation from the bottom-up

Julien LENORMAND - Kaizen Solutions

Hexagonal Architecture

Who has heard of it ?

Who knows what it is ?

Who could explain ?

Yet another explanation ?

on the Kaizen Solutions blog:

French: “une introduction à l’architecture héxagonale” by Xavier BOUVARD

emoji : doubtful

Why so complicated ?

Quizz on the pre-requisites :

  • SOLID ?
  • domain ?
  • API vs SPI ?
  • difference between “infrastructure” and “persistance” ?
  • “controller” in the “Clean Architecture” ?
  • reading Java code that is not written in English ?

so much jargon …

Une approche explicative différente

Who is a developer ? We want code !

starting from a concrete et real use-case/problem

mostly code, that we will comment together

keep the “big concepts” for the end of the talk (wrap-up)

Let’s code !

Initial problem

Follow the progress of the Pull requests and their reviews by several people in the same team working on mciro-services.

First solution

Quick prototyping in Python :

v1

import stashy  # library for BitBucket

login = "Julien"
token = "AbCdEf01#"
server_url = "https://bitbucket.internal.corp:1234"
project_name = "AwesomeProject"  # containing several repositories

stash = stashy.connect(server_url, login, token)

for repo_data in stash.projects[project_name].repos.list():
    repo_slug = repo_data["slug"]  # identifier
    print("Repo " + repo_slug)
    for pr_data in (stash.projects[project_name]
            .repos[repo_slug].pull_requests.list()):
        title = pr_data["title"]
        author_name = pr_data["author"]["displayName"]
        print(f" - PR {title!r} de {author_name!s}")
        for reviewer_data in pr_data["reviewers"]:
            reviewer_name = reviewer_data["user"]["displayName"]
            has_approved = bool(reviewer_data["approved"])
            print(f"   - {'OK' if has_approved else '  '} {reviewer_name}")
Repo AlpesDHuez
 - PR "add foo to bar" by Elena
   - OK Gabin
   -    Julien
 - PR "remove qux from tez" by Julien
   - OK Gabin
Repo Bonneval
Repo Chamrousse
 - PR "increase pol to 4000" by Julien
   -    Elena
   -    Gabin
Repo GrandBornand
Repo Meribel
 - PR "doc for lud" by Elena
   - OK Agathe

We can do clearer : explaining what actions are available !

A second solution

v2

my_id = "123456"

pr_author_id = pull_request_data["author"]["id"]
i_am_reviewer = my_id in (reviewer_data["user"]["id"]
                          for reviewer_data in pull_request_data["reviewers"])

if my_id == pr_author_id:
    print(f"Repo {repo_slug!s} PR {pr_title!r}")
    # display the list of persons that have NOT approved, or PRs that are ready to be merged
    reviewers_data_not_approved = tuple(reviewer_data
                                        for reviewer_data in pull_request_data["reviewers"]
                                        if not reviewer_data["approved"])
    if len(reviewers_data_not_approved) == 0:
        print(" -> to merge")
    else:
        print("\n".join(" -> contact " + reviewer_data["user"]["displayName"]
                        for reviewer_data in reviewers_data_not_approved))
    
elif i_am_reviewer:
    print(f"Repo {repo_slug!s} PR {pr_title!r} by {pr_author_display_name!s}")
    print(" -> to review")
Repo AlpesDHuez PR "add foo to bar" de Elena
 -> to review
Repo AlpesDHuez PR "remove qux from tez"
 -> to merge
Repo Chamrousse PR "increase pol to 4000"
 -> contact Elena
 -> contact Gabin

LGTM

Team rules

I add :

  • 2 approvals by PR
  • checking for un-approval : “needs work”
  • checking if the PR has been updated since my review
  • detect PRs that have been forgotten

A great tool

several versions

. . .

2 distinct use-cases are appearing :

  • “personal” view
  • “team/global” view
@dataclass
class Repo:
    slug: str
    name: str
    pull_requests: Sequence[PullRequest]

@dataclass
class PullRequest:
    name: str
    author: User
    reviewers: Sequence[Reviewer]
    approvals_count: int
    created_datetime: datetime.datetime
    updated_datetime: datetime.datetime
@dataclass
class User:
    bitbucket_id: str
    corporate_id: str
    display_name: str

@dataclass
class Reviewer(User):
    has_approved: bool
    approval_status: ApprovalStatus

@enum.unique
class ApprovalStatus(Enum):
    UNAPPROVED = "UNAPPROVED"
    APPROVED = "APPROVED"
    NEEDS_WORK = "NEEDS_WORK"
def fetch_all_pull_requests(stash, my_project_name: str) -> Sequence[Repo]:
    ...  # quite similar to what we have done earlier, but instead of `print`
    return tuple(...)  # all is collected into objects and returned to caller

def print_my_personal_actions(repos: Sequence[Repo], my_id: str) -> None:
    ...  # we iterate over the `repos` parameter, and print only personal actions

def print_the_team_global_view(repos: Sequence[Repo]) -> None:
    ...  # we iterate over the `repos` parameter, and print the global view

v3

def main() -> None:
    ...  # setting the variables required
    stash = stashy.connect(my_server_url, my_login, my_token)
    repos = fetch_all_pull_requests(stash, my_project_name)
    print_my_personal_actions(repos, my_id)
    #print_the_team_global_view(repos)

Test with fake data

Serializing :

...  # same definitions as before

...  # parameters and creating the `stash` client
repos = fetch_pull_requests(stash, my_project_name)
with open("test_data.json", "wt") as test_data_file:
   json.dump((dataclasses.asdict(repo) for repo in repos), test_data_file)
   # I am hiding the pain of serializing `datetime.datetime`

De-serializing :

# no `stash` needed !
with open("test_data.json", "rt") as test_data_file:
    repos_data = json.load(test_data_file)
    repos = tuple(
        Repo(
            name=repo_data["name"],
            ...  # I am hiding all the nesting of objects instantiation
                 # We could do something smarter, but this works for now
        ) for repo_data in repos_data
    )
print_my_personal_actions(repos, my_id)
#print_the_team_global_view(repos)
def test__team_global_view__case07():
    # Given
    repos = load_from_file("master_record_07.json")  # what data we want to use
    
    # When
    actual_output = compute_the_team_global_view(repos)  # what is the result
    
    # Then
    expected_output = load_from_file("master_record_07.txt")  # what is the expected output of case 07
    assertStringEquals(expected_output, actual_output)  # or your testing framework's relevant method
 Repo AlpesDHuez
  - PR "add foo to bar" by Elena
-   - OK Gabin
-   -    Julien
+   - ✔️ Gabin
+   - ✳️ Julien

Dirty code !

v4

... # defining some constants, some that are
    # unused depending on the value of the flags below
load_from_file_instead_of_fetching = True
save_the_fetch_result_to_file = True
mode_team = False

if load_from_file_instead_of_fetching:
    repos = load_repos_from_json_file(test_data_filepath)
else:
    repos = fetch_pull_requests(stash, my_project_name)

if save_the_fetch_result_to_file:
    save_repos_into_json_file(test_data_filepath)

if mode_team:
    print(compute_the_team_global_view(repos))
else:
    print(compute_my_personal_actions(repos, my_id))

Features planned :

  • render into HTML
  • add GitLab repositories
  • anticipate on vacations
  • comments on the PRs
  • group by Story instead of by repo

Huge refactoring

Complexity increasing ➡️ being more demanding/rigorous

Scope change ➡️ taking the time to re-think the tool

It is an assistant to humans that work on Pull Requests, to give them a global view of what is happening on all the repositories, also to give them a detailed and personal view for each PR of its progress and their involvement. The tool fetches data mainly from Git servers (BitBucket, GitLab, …), agregate them with those of ownership, then generate and publish differents kinds of reports, global or targeted, on different supports (terminal, web, API of the different servers, …).

Abstraction

class PullRequestsFetcher:  # classe abstraite
    def fetch_pull_requests(self) -> Sequence[PullRequest]:
        raise NotImplementedError  # méthode abstraite en Python
    

class GitLabPullRequestsFetcher(PullRequestsFetcher):
    def __init__(self, server_url, credentials, ...):
        ... # specific to GitLab
    def fetch_pull_requests(self) -> Sequence[PullRequest]:
        ... # fetch from GitLab

class BitBucketPullRequestsFetcher(PullRequestsFetcher):
    def __init__(self, server_url, credentials, ...):
        ... # specific to BitBucket
    def fetch_pull_requests(self) -> Sequence[PullRequest]:
        ... # fetch from BitBucket

v5

def fetch_pull_requests(fetchers: Sequence[PullRequestsFetcher]):
    for fetcher in fetchers:
        for pull_request in fetcher.fetch_pull_requests():
            ... # using a "GitServer-agnostic" pull request object

def main():
    fetchers = [
        GitLabPullRequestsFetcher(gitlab_url, gitlab_credentials),
        BitBucketPullRequestsFetcher(bitbucket_url, bitbucket_credentials),
    ]
    pull_requests = fetch_pull_requests(fetchers)
    ...  # and the reste

Test

class FakePullRequestsFetcher(PullRequestsFetcher):
    def __init__(self, filepath):
        self._pull_requests = load_from_file(filepath)  # reusing the same code as before
    def fetch_pull_requests(self) -> Sequence[PullRequest]:
        return self._pull_requests

def test__team_global_view__case07():  # the same test as before !
    # given
    expected_output = load_from_file("master_record_07.txt")
    fake_fetcher = FakePullRequestsFetcher("pull_requests_07.json")  # creating a `fake`
    pull_requests = fetch_pull_requests(fetchers=[fake_fetcher])  # creating test data
    # when
    actual_output = compute_the_team_global_view(pull_requests)
    # then
    assertStringEquals(expected_output, actual_output)

Test (bis)

Testing only with fakes is too fragile ?

Contract testing !

def test__contract__bitbucket():
    real_fetcher = BitBucketPullRequestsFetcher(server_url, credentials)
    fake_fetcher = FakePullRequestsFetcher(server_url, credentials)
    assertSequenceEquals(real_fetcher.fetch_pull_requests(),  # expected
                         fake_fetcher.fetch_pull_requests())  # actual for the tests using the fake

Refactor again !

After the fetchers, the printers !

class Reporter:
    def display(self, pull_requests) -> None:  # we are expecting a side-effect (IO)
        raise NotImplementedError  # abstract method

class StdoutGlobalReporter(Reporter):
    # an `__init__` is not even needed !
    def display(self, pull_requests) -> None:
        for pull_request in pull_requests:
            print(...)  # partly the code from `print_the_team_global_view` (v3)

class HtmlFileGlobalReporter(Reporter):
    def __init__(self, filename):
        self._filename = filename
    def display(self, pull_requests) -> None:
        with open(self._filename, "wt") as html_file:
            html_file.write(...)  

v6

def main():
    ...  # cf v5
    pull_requests = fetch_pull_requests(fetchers)
    # now we can do :
    reporter = StdoutGlobalReporter()
    # or else :
    reporter = HtmlFileGlobalReporter("report.html")
    # but this does not change :
    reporter.display(pull_requests)

And more !

All problems in computer science can be solved by another level of indirection.
— Butler Lampson, Beautiful Code, 1972

All the problems in computer science can be solved by yet another level of indirection.

@dataclass
class GlobalReport:
    ...
@dataclass
class PersonalReport:
    ...

def compute_global_report(pull_requests) -> GlobalReport:
    ...
def compute_personal_report(pull_requests, person) -> PersonalReport:
    ...

class ReportPrinter:  # abstract class
    def display_global_report(self, global_report) -> None:
        raise NotImplementedError  # abstract method
    def display_personal_report(self, personal_report) -> None:
        raise NotImplementedError  # abstract method

class StdoutPrinter(ReportPrinter):  # child class
    def display_global_report(self, global_report) -> None:
        ...
    def display_personal_report(self, personal_report) -> None:
        ...

class HtmlPrinter(ReportPrinter):  # child class
    ...  # implementing the abstract methods

v7

def main__create_global_report(fetchers: Sequence[Fetcher], printer: ReportPrinter) -> None:
    printer.display_global_report(fetch_pull_requests(fetchers))
def main__create_personal_report(fetchers: Sequence[Fetcher], printer: ReportPrinter) -> None:
    printer.display_personal_report(fetch_pull_requests(fetchers))


def main():
    # I can do whatever I want !
    all_fetchers = [FakeFetcher(...), GitLabFetcher(...), BitBucketFetcher(...)]
    all_printers = [FakePrinter(...), StdoutPrinter(...), HtmlPrinter(...)]
    for fetchers in itertools.combinations(all_fetchers):
        for printer in all_printers:
            main__create_global_report(fetchers, printer)
            main__create_personal_report(fetchers, printer)

Handling evolutions :

  • a news kind of input
  • a new way to fetch
  • a new kind of output
  • a new way to print

A perfect solution ?

No, not at all !

  • leaky abstractions
  • abstract classes
  • N x M problem
  • naming
  • structural complexity

Recap of the track we have followed

  • v1 : v0 quick and dirty, global view
  • v2 : personnal view (instead of previously the global one)
  • v3 : re-adding the global view (2 fonctions, I comment/de-comment the one I want to run)
  • v4 : after adding test interfaces, the main looks ugly
  • v5 : function fetch_prs that is using abstract fetchers
  • v6 : function reporter.display that delegates to abstract reporters
  • v7 : introducing the pure compute functions

Architecture hexagonale !

- core/  # folder containing the domain code
    - reports.py  # report generation functions
    - fetching.py  # abstract class: Fetcher
    - printing.py  # abstract class: Printer
    - tests/  # domain code tests
        - fakes.py  # Fake classes inheriting from Fetcher and Printer
- fetchers/  # folder containing the "production" classes inheriting from Fetcher
    - gitlab.py
    - bitbucket.py
    - tests/  # tests of these implementations
- printers/  # folder containing the "production" classes inheriting from Printer
    - stdout.py
    - html.py
    - tests/  # tests of these implementations
- main.py  # the app entry point

Dependencies

Programmatic access (API)

def generate_personal_report(username,
                             gitlab_credentials, gitlab_url,
                             bitbucket_credentials, bitbucket_url,
                             output_file_path) -> None:
    from core import compute_personal_report
    from fetchers import GitLabFetcher, BitBucketFetcher
    from printers import FilePrinter
    fetchers = [
        GitLabFetcher(gitlab_url, gitlab_credentials),
        BitBucketFetcher(bitbucket_url, bitbucket_credentials),
    ]
    personal_report = compute_personal_report(username, fetchers)
    printer = FilePrinter(output_file_path)
    printer.display_personal_report(personal_report)

Interactive access (CLI)

def main():
    from argparse import ArgumentParser
    # configure the ArgumentParser
    args = parser.parse()
    if args.personal_report:
        # grab the config values from the args
        # instatiate everything that is required : fetchers, printer, ...
        compute_personal_report(...)
        # ...
    elif args.global_report:
        # same
        compute_global_report(...)
        # ...
    else:
        # ...

Web Access

from flask import Flask  # web framework simple et fast enough
from core import compute_personal_report, compute_global_report
from fetchers import GitLabFetcher, BitBucketFetcher
from printers import JsonPrinter

app = Flask("web-API to generate reports")
# ...
FETCHERS = [GitLabFetcher(GITLAB_URL, GITLAB_CREDENTIALS),
            BitBucketFetcher(BITBUCKET_URL, BITBUCKET_CREDENTIALS)]
PRINTER = JsonPrinter()

@app.route("/personal_report/<str:username>")
def create_personal_report(username: str):
    return PRINTER.display_personal_report(compute_personal_report(username, FETCHERS))

@app.route("/global_report")
def create_global_report():
    return PRINTER.display_personal_report(compute_global_report(FETCHERS))

def main():
    app.run()

Photo credits

Photo by Robert Linder on Unsplash

Photo by David Clode on Unsplash

Photo by Dmitriy Demidov on Unsplash

Photo by Stephen Radford on Unsplash

Photo by Christian Wiediger on Unsplash

Photo by danilo.alvesd on Unsplash

Photo by drmakete lab on Unsplash

Julien LENORMAND

Python dev and speaker

julien.lenormand@kaizen-solutions.net julien.lenormand@non.se.com