Michael Brunton-Spall on …

Using AWS Workspaces to control access to documents

2020-04-28T10:00:00+00:00

I’ve recently worked on a project where we had to have some documents that needed to be kept reasonably secure, and on the clients computers for our project. We needed our developers to have some access to the documents, to visually inspect them, and to be able to run code on them, but we didn’t want the developers to have copies on their local laptops or computers.

We decided that AWS Workspaces would be a good fit for this usage. For those of you who haven’t used it, AWS Workspaces allows you to create desktop computers in AWS’s data centers which you can connect to via remote desktop protocol. It looks like your computer, but in fact your mouse and keyboard are attached to a remote computer. These are standard windows or linux desktop computers and the users can do anything on them that you’d expect to be able to do on a local computer, but of course nothing about the computer leaves the Amazon data center.

Locking down the data

We had a smallish set of data that we wanted to do some data science on, that is running some scripts that might parse the files, look for commonalities and so on. Of course, in order to do that the developers need to be able to look at the files, so they know what they’re looking at. We could allow individual developers to see a handful of files by using a client laptop, but that laptop is restricted in running code, so we selected AWS Workspaces as a way of triaging access.

Our first step therefore is to migrate data into somewhere accessible by the Workspaces computers. We picked an S3 bucket for this. We were able to create the S3 bucket with ACL’s set to disallow public access.

Workspaces can be deployed inside an Amazon Virtual Private Cloud (VPC), which means that we can create a private subnet that cannot be accessed from the internet, create the workspaces in there, and then configure the AWS NAT Gateway to enable the amazon client to access the machine and to allow certain data transfer out.

This means that the developer can connect to our Workspaces client, and they get bought up on a machine inside the private subnet, totally isolated from the internet. However, using AWS Gateway Endpoint, we can enable the private subnet to access the S3 bucket. On the S3 bucket, we can set an allow policy that allows access from the private subnet, and we can configure the Endpoint Gateway with an endpoint policy that allows access only to our specified bucket. This means that the devs can download files from the S3 endpoint, but cannot upload those to a new public bucket in any way.

This is pretty much all configured the way that the Workspaces documentation recommends, however we wanted a little extra security.

Authenticating the users

While it would be nice to enable MFA, requiring developers to authenticate not just with a password, but with a token from their mobile phone, this requires configuring the Managed Active Directory to work with a RADIUS server, and that’s beyond my rather basic technical ability.

But we really don’t want anyone who tries to connect to our desktops to just try passwords in a password spraying attack. I really want some form of second factor. Luckily, AWS provides something called Trusted Devices within AWS Workspaces. This means that a device requires a machine installed certificate that has been signed by a defined certificate authority.

This sounds pretty simple, we’ve got a couple of steps we need to take:

Create a certificate authority
Create certificates per device/user
Get the certificates to the developers
Let them authenticate to AWS Workspaces
???
Profit

Of course, this sounded easy, but it turned out to be much harder than it should be. The biggest problem here is that AWS Workspaces client doesn’t actually seem to provide any logging of any form. I spent a lot of time looking at an error screen telling me that it couldn’t connect with no explanation as to why.

Create a certificate authority

If we were running a large estate or big corporate installation of this, we might want to use a full PKI process. In my case, we were looking for less than 5 developers having access to infrastructure that is going to be destroyed in a few months, so we can roll out own Certificate Authority.

Our first step is to create the Certificate Authority private key: openssl genrsa -aes256 -out CA.key 2048. This private key is going to be our root key, so give it a good password. However, you are going to be typing that password a lot, so make it something you can type! I recommend using three simple words, so something like workspace signing easy might work just fine.

Next we need to create the public version of the key, openssl req -x509 -new -nodes -key CA.key -sha256 -days 1024 -out CA.pem -subj "/C=GB/ST=England/L=London/O=ORGHERE/OU=OUHERE/CN=CNHERE". You’ll want to update ORGHERE, OUHERE and CNHERE with relevant names. In reality, very little of this is going to matter, but the CN is used in a few places, so make it something memorable. projectname.local makes a good CN here.

Now you’ve got a CA.pem file, you can upload it to the AWS dashboard in AWS Workspaces and Workspaces will only allow clients that have a certificate signed by the CA to be allowed. Note that you only need to send the public portion of the key, CA.pem, to AWS. You still need to keep the private key, CA.key, secure on a local device somewhere.

Creating a client certificate

You can do this one of two ways. The easiest is to generate the certificates on the CA machine and then copy both the private and public keys to the laptop or desktop that needs to use them. The harder way is to generate the private key on the laptop and only copy the intermediate files around. The second, harder way is marginally safer, but with decent passwords and a limited risk exposure, you may be willing to use the easier method.

You’ll need to replace the word CLIENT throughout with the name of the machine or some other identifier

First we need to generate a certificate on the local machine using openssl genrsa -aes256 -out CLIENT.local.key 2048. Again, you want to set a decent password here, as anyone who gets hold of this key can authenticate to your AWS Workspaces client.

Once you’ve got a private key, you need to generate a signing request using openssl req -new -key CLIENT.local.key -out CLIENT.local.csr -subj "/C=GB/ST=MyCounty/L=MyTown/O=MyOrganisation/OU=MyOrganisationUnit/CN=CLIENT.local.client". This signing request, the csr file, is the thing that you need to get to your CA computer to be signed, so copy it over, or if you are lazy, do this all on the same machine!

Now we need to sign the certificate. This caused me a lot of pain in AWS Workspaces, I had to ask friends who had done similar things to get the exact options setup. The command is openssl x509 -req -in CLIENT.local.csr -CA CA.pem -CAkey CA.key -CAcreateserial -out CLIENT.local.crt -days 365 -sha256 -extensions v3_req -extfile ext.txt. This reads in the csr file, and the CA key, and it output a signed certificate in the crt file. It tells openssl to use the sha256 signing algorithm (needed by AWS Workspaces), and it tells it to use some v3_req extensions. This final part is really important, the whole thing wont work if you are missing this.

In order to generate the right extensions, you’ll need an ext.txt file, which should contain the following contents:

[v3_req]
keyUsage = digitalSignature
extendedKeyUsage = clientAuth

This tells the CA that it wants the certificate to be signed for client authentication and signature purposes.

You can now copy the certificate back over to your laptop.

Getting OSX to use the certificate

You’d think that this would be enough, but we also had problems getting OSX Keychain to actually import the CRT file. I’m not 100% sure that this step is necessary, but it’s what got it working for me, so I recommend it for you as well.

What we need for Keychain to easily import the certificate is to make a bundle of the key, the certificate and the public key for the CA. This bundle can be created with the pkcs12 tool as follows: openssl pkcs12 -export -aes256 -out CLIENT.local.full.pfx -inkey CLIENT.local.key -in CLIENT.local.crt -certfile CA.pem.

This will create a pfx file, which you can then double click on to import into your local keyring. If you do this, you’ll import the certificate into the users keychain, which means that whenever AWS Workspaces wants to access it, you’ll be prompted for your login password (or if you’ve got it setup, touchid). You can also import this certificate into the login keychain, which means it wont need your password to access, but that also means that anyone with access to your device can use it without your password.

Once you’ve got this in, you should be able to start the AWS Workspaces client, and it’ll authenticate properly and let you in.

I’ve created some simple scripts to create CA and client keys for you which you might find useful if you are doing the same thing.

I hope this helps you, I’m mostly impressed with Workspaces as a remote desktop solution, it does what it says on the tin, and ensures that my cloud data stays in the cloud, and not on the laptop that can be left in a pub, stolen on the train or otherwise lost.

Addendum: A better approach for MacOS

Some feedback on my approach outlined here has a more scalable approach if you are deploying to a larger number of client devices and want to run the CA slightly more securely. In this case, you’ll need to generate the CA certificates as above, but you can ask your developers to generate their certificates in a far more user friendly way.

Generating the certificate can now be done by using the Mac OS built in Certificate Assistant. Simply open up the keychain access app, and from the menu, choose Certificate Assistant > Request a certificate from a certificate authority. You need to enter the email address of you as a developer, the email of the certificate authority, and make sure that the keypair is generated as an RSA keypair (it should be the default). This will send a CSR (or certSigningRequest) file to your CA email address.

Now, as the CA, when you recieve this file, you can download it to a directory, and then sign the certificate request using openssl x509 -req -in CLIENT.local.certSigningRequest -CA CA.pem -CAkey CA.key -CAcreateserial -out CLIENT.local.crt -days 365 -sha256 -extensions v3_req -extfile ext.txt. This will generate the signed `crt’ file which can be emailed back to the developer.

Once you get your crt file back, simply double click it and your Mac will import it into your keyring and the next time you use the AWS Workspaces client, it should connect perfectly.

You’ll note that you don’t need to create the PFX file. I had a lot of problems originally with the certificates, and had to create a pfx file to get the Mac to import it, but if you do it this way, it doesn’t seem to need it. I think that’s because the key file is already imported into your local keychain, so the certificate has a match and is therefore trusted.

I’ve updated the github scripts to have a CA directory and a client directory, and there is a sign_request.sh in the CA directory to perform the above signing for you.

What is legal basis under GDPR?

2019-02-25T12:39:45+00:00

In a conversation the other day, I was trying to explain why some data couldn’t be collected and processed under “legitimate interests”. I wrote the following to try to outline the different types of legal basis that data can be collected or processed.

I thought these examples might be useful to help you decide how you are collecting and processing data. This is not legal advice, and you should consult professional advice if you have concerns. You should also read the formal guidance for the UK

Also massive props to Sunitha for being my guinea pig for all the examples. She’s a lovely human and doesn’t really deserve all these things to happen to her!

Sunitha goes to a website and puts her name and email address in to get access to a report on “data protection for you”, That was clearly active consent, and is covered under consent.

Contract

Sunitha’s data is passed by her employer to a HR firm so that they can process the HR data in order to pay her. Her employer has signed a contract with the HR firm, and that is a contractual exchange of data. Note that Sunitha doesn’t have to consent knowingly to the HR firm having the data, although she should be bound under a contract to her employer to let them do that.

Legal obligation

Sunitha’s employer is required to take a photocopy of her passport (personal data) to prove to the Home Office that they believe that she has a legitimate right to work in the country. They are required by law to do this, and therefore they don’t need her consent or a contract to do so.

Vital interests

Sunitha is found unconscious at the side of the road. The security guard who finds her looks in her wallet and discovers where she lives and her name to give to an ambulance when it arrives. The guard believes that her life or wellbeing might be in danger, and as such he has her “vital interests”, and can access that data without consent, a contract, or legal obligation.

Public Task

The Local Authority is inspecting Sunitha’s kitchen because she sells sandwiches on the side. They record that she is the owner of the kitchen and that it was clean. Her name and place of work are personal data, but the local authority can process that data because they are the official authority for maintaining food standards (as appointed by the Food Standards Authority). They don’t need consent, a contract, vital interest or legal obligation to do so.

Legitimate Interests

Finally, Sunitha is an MP who has had her expenses sent to a major newspaper. The newspaper has scanned and put all of the expenses online to allow citizens to look for illegitimate expenses and flag them. The newspaper considers that as an investigatory journalistic organisation interested in the misuse of public funds, that there is value in processing the data and making it available online, and declares that it has a “legitimate interest” in Sunitha’s expense receipts without consent, a contract, vital interest, legal obligation or public task to do so.

Brexit and data transfers

2019-02-21T16:10:15+00:00

Post brexit, there may or may not be a problem with data being transferred across borders. But all of the guidance and people talking about this seem to have some very confused concepts of terms and the processes involved, making it really hard to get clear guidance for organisations.

An example organisation

To make this clearer, let us think about a simple example of an organisational problem and see what comes out.

Imagine that we are running MBS Ltd, a news company. We scrape the internet for news articles and then put together topic pages with recent articles. As a user, you can browse the website for free, or you can create an account with your email address, and we will track what articles you read, and create you personalised pages that recommend articles that we think you’ll like.

In this case, the personal data that we collect (and we put in our privacy policy) is:

Your email address
A list of articles you clicked through to, and “thumbs up and thumbs down” ratings you give to articles

We also use cookies to track anonymous users and build lists of articles that they view or like, so we can data mine that information to determine that “people who liked article X on topic A also liked article Y on topic B”.

We built the system as a simple Ruby on Rails web application, hosted on AWS in the EU-West-1 region. We have an AWS RDS instance that stores all of the data that we use.

What is a data controller and data processor?

If you are running a company that accepts personal data, such as email addresses, from end users, and you store that information, then you are a data controller.

The data controller is the person or organisation responsible for the data, and responsible for caring for the data and doing to it what they told the user that they would do to it.

In this case, we collect personal data about the majority of users, the cookie information and logs keep IP address and preferences, as well as the details of our signed up users. We are therefore the data controller for this data.

A data processor is anybody who stores data or processes data on behalf of the data controller. In this case, for our customers, we put the data into AWS, who acts as a data processor. We remain responsible for the data while it is in AWS, but AWS processes it on our behalf according to the standard contract that we have with AWS (those Ts&Cs that our CTO clicked through without reading when we first setup the startup).

Offshoring

Offshoring is a tricky subject, and Joel has written more about this, based on some comments of mine. But essentially, if we are a British company, then we have the following legal agreements.

We have a contract with AWS EMEA SARL, a company registered in Luxembourg that says that we allow them to process our data on our behalf. AWS says that they are processor who will only act on our instructions

In essence, AWS says that it’s our responsibility to ensure that we have a legal right to give the data to AWS, and that they may reproduce the data as needed to move it around the internals of AWS.

We’ve selected the EU region, which means that our data is being held in servers in Ireland. AWS says that it has staff all round the world who potentially help administer their services on their behalf.

Current legal status

Currently we have a privacy policy and consent from our users to hold their data. Because the UK is within the EEA, the data is being held in the EEA, but an EEA company and not being transferred in or out of the EEA, so everything is fairly simple.

What happens if we leave without a deal?

In the event of the UK leaving the EU without a deal, things start to get a little tricky.

Firstly, the UK has already said that they will create a “finding of adequacy” for the EEA that says that it is safe and acceptable for UK companies to transfer UK citizen data to an EEA company to be processed. This means that our basic system is fine, we can still transfer data into AWS Europe under the same process as we did before.

The EU has not said that they will necessarily create a finding of adequacy in the other direction, so it may not be possible for an EU company to transfer data into the UK without consent from the users. In our case this is probably fine, because we don’t receive information as a data processor from another company in europe.

End users are still allowed to make their own decisions about transferring data, so an EU citizen who choses to sign up to our service is allowed to hand their data to a UK company, providing our privacy policy and consent clauses are clear to them that they are doing so. This is a lawful basis of processing based on consent.

The problematic part is that data transfers from the EU to the UK might be forbidden by the EU GDPR since the UK will be outside the EEA and won’t have a finding of adequacy.

What is a data transfer?

The key question here is about what constitutes a data transfer. The guidance on international transfers is still quite vague, and has holes you can drive a truck through, for example

Personal data is transferred from a controller in France to a controller in Ireland (both countries in the EEA) via a server in Australia. There is no intention that the personal data will be accessed or manipulated while it is in Australia. Therefore the transfer is only to Ireland.

This use of the words “server” or “intention” don’t make any sense to me. How one measures the intent for data to be accessed or manipulated by a “server”, given that any computer data transfer must involve some level of packet technology that will manipulate the data, store it on a drive, and then access the data to serve it back up. Additionally, storing data on a server outside the EEA that you “don’t intend to be accessed” doesn’t sound like a good defense if the said server is hacked somehow! It certainly doesn’t match up with the ICO’s definition of “processing”.

It’s also worth noting that you can transfer data internationally under a number of different arrangements. What we are worrying about here is the specific transfer of data without consent, without a “legal instrument” and without any of the other exemptions to international transfers. Since using a cloud service provider is commonly perceived to be one of these restricted transfers, that’s what we are worrying about here.

Anyway, if your data is stored in AWS Europe, and your staff in the UK are accessing that data to do their jobs, it’s slightly questionable whether this is a transfer. Reading the guidance, this is clearly not the intention, transfers are clearly meant to be “a transfer for the purposes of processing by a third party”, and access by the original data controller is not a transfer. However, this isn’t made clear and nobody is making it any clearer.

Secondly, it’s entirely unclear whether data provided by EU citizens, who thought it would be stored by a company based inside the EU, are content that when the UK leaves the EU without a deal, that their data can continue to be accessed by the company. If your original privacy policy and consent made clear that this was stored by a UK company, you might be fine, but if you had general terms that just stated that it was a EU organisation that kept data inside the EEA, it might be arguable that the UK should no longer have access post brexit.

What about the US?

The EU made a finding of adequacy around US based companies providing:

They are certified under the EU-US privacy shield
They provide a method of legal redress

The first is pretty simple, since Privacy Shield is a self certification process from 2016 onwards, almost any US company that wants to do business with EU citizens or process data on behalf of EU companies has self certified.

The legal redress is a bit harder, because privacy shield says that the organisations must comply with the EU data protection authorities, and if the UK leaves the EU without a deal, then there may be no access for UK citizens to the EU data protection authorities.

Arguably, the terms at the time clearly meant that US companies should comply with the UK ICO (who is the UK’s data protection authority), but this still is legally unclear, and legally unclear is not a very happy place to be.

So known unknowns?

As it stands right now, there are two major unknowns that we don’t have clarity on.

Can a UK company access data held on EU citizens, that was until now held in the EU once the UK leaves the EU? Will it be different if that data is transfered to the UK now vs if you access the data across the border post brexit?
Can a UK company use the EU-US Privacy Shield agreement post brexit without additional modification?

What if I move my data to AWS UK region?

Insourcing your data to a different processor based on UK soil before the brexit date would currently be legal, it’s a transfer within the EEA, and a perfectly acceptable form of restricted transfer. Once it’s in the UK, it seems to remain legally acceptable to access it after a no deal exit, and while transfers might be forbidden after an exit, EU citizens can still choose to give their data to you. So if you buy data from an EU company that contains personal data, you can’t conduct a restricted transfer post-brexit (although as before there are other clauses such as consent, contracts and others that you may be able to transfer it under).

However, using AWS UK (or other major cloud providers for that matter) may not solve the problem. Why? Because your contract is with AWS EMEA SARL, a Luxembourg, and thus European country, not with a UK company. While the data might rest on servers in the UK, it’s still definitely under the control of an EU company, and therefore accessing the data could still constitute a transfer from an EU company to a UK company.

The only way you could do this successfully right now would be to host the data with a UK based company who owns servers in the UK, rather than a US/EU company with servers in the UK.

I suspect that a UK company could manage servers in the EU, and this would be far more acceptable than the other way round, as per the transfer example, providing the data was never intended to be accessed or processed in the EU, it would not have been transferred.

This really does tell you that where the data is physically located is of almost no relevance in the ICO’s eyes, it’s entirely about where the organisation is legally based who does the processing, and what the intent is.

Discovery, Alpha, Beta, Live… Part 2

2018-12-09T12:00:42+00:00

This seems to have come up again, with discussions about what the purpose of a discovery, alpha beta actually is, and when you should build your MVP.

TLDR

Discoveries help you identify problems. There are no products, no MVP’s, and almost nothing resembling software development or delivery as typically imagined in this phase.

Once you have a known problem, you need to understand the potential solution space. GDS calls this an Alpha, I prefer to call it prototyping. There are no real products, in fact in many cases, you are rapidly experimenting. Development if it happens at all is rapid hypothesis driven development design purely to get the answer to a single hypothesis. “Does a user know their identification number?”, “If we ask for monthly salary, does that make sense to people paid weekly”. You iterate on hypothesis here, trying to derisk the final product and have a real proposal for what to build

Once you have a desired and tested solution (even if lightly), you can move onto starting to build a product. GDS calls this a Beta, but I think that covers a swathe of development. I’d say the first 2–6 weeks are building the alpha MVP. At this stage you are building a team, building the production capability (hosting, security, build pipelines etc). You are also planning what features are minimum, and it’s the chance to work out what order you’ll approach those in. I’d have called this an “Alpha” phase, but that concept is taken, so I think of this as MVP phase. For very large projects (Universal Credit size), this could be months, but I find you can’t keep teams in this early place for long, they want to get on a deliver, so next you move onto

Building and releasing your MVP. This for me is the now descoped “Private Beta”. This is the phase where you work out just how minimal your product can be to launch, and whether your users will consider that minimal feature set to be viable. Once you have that MVP, you need to get it in front of real users and start getting feedback, so you should start with 1 to 5 invited users, and then maybe roll up to maybe a few hundred.

Once this phase is done and you have users who are using your MVP, you are now into the more complex phase of product delivery, because you have two streams of work, a set of findings from your real users on the ways that they use your product or service, and a set of epics or further features that you always planned to build. You are now building and delivering features to a live audience, and probably growing that audience as well. Your team needs to change the way it operates because breaking the system will bother users. Note you might be in a private beta phase of your product, but live to real users (If only GDS had used different words to avoid this confusion 😭)

Finally you are ready for any number of users to join your service, so you throw the doors open. You are now in a public beta, you still may not have the entire feature set, but anyone can use this service. You are still in active development, and you are still building features from the backlog and gathering user feedback.

At some point, the feedback starts to dry up, you have built all the features you originally promised, and the need for an active development team starts to go away. You need to continue to operate the service, and you need to continue to make changes. There will be security patches, there will still be points of user frustration and of course, there will always be stakeholders with opinions on their favourite feature that they insist on for no discernable data backed reason. Your team is going to look very different now, but your service is Live, the beta banner can go away and the rate of change should drop considerably.

Finally, something else comes along to replace your service, and you migrate users over to it and tear down your service, having a requirements document of backlog burning ceremony and all going on to new and better things.

The Phases, Team and Budget structures

Discovery

During a discovery, the aim for the team is to understand whether a problem hypothesis is actually a problem. What are the problems that users face, who already exists to solve that problem, and why are they not meeting user needs.

A critical question during the discovery should be “Is this a problem that Government should solve?” because there are many problems that exist, that Government is not the right place to solve the problem directly. Instead recommendations might be made to change policies, adjust funding, or share research and data with the charitable sector

Funding

A discovery pot for n discoveries per year

Team Structure

Small fast moving teams of around 3 -5 people. Skills will include delivery management, business analysis, user research and traditional research and analysis at least. Data science may come in handy here depending on the data available to you.

How does work come in

A backlog of potential problems, prioritised by the business in terms of where user needs are not being met, number of complaints and identified waste. Discoveries are selected from the backlog and completed

What happens after

Discovery outputs should be put into a discovery library. A recommendation should be written about whether there are potential solutions and placed into the Prototyping backlog

Prototyping

Carrying out a set of prototypes to explore the potential solution space for problems identified at discovery. You could combine Discovery and Prototyping, especially if you have only a small team, which would increase learning, speed and efficiency, but make it far more likely that people get invested in the problem/solution. Fresh eyes will take a more critical look at the problem space and might determine solutions that the discovery team didn’t recognise

Funding

Via the discovery pot or a prototyping pot, funding for x prototypes per year

Team Structure

Small fast moving teams that can explore the solution space. A product manager, an interaction designer, a user researcher are all key to attempting to come up with prototypes. A developer or two at this stage can help build more high fidelity prototypes, but must be careful not to over complicate the prototype.

How does work come in

Outputs from discoveries should recommend prototyping and testing some solutions with users. These should be prioritised by the organisation based on the identified need, the ability of the organisation to meet the need and the capacity of the organisation to build something after. There’s no point starting a prototype if the organisation just cannot possibly make that prototype a reality later

What happens after

Prototypes should prove whether an idea is possible and whether it can meet the user need. Do we have the data, the technical capability to build this, and if we do will it meet the user need.

At the end of the prototyping phase, the collected learnings should be documented and stored in an archive. A rough outline of an MVP type program could be created, along with estimates for time, money and skills needed. This should be enough to build an outline business case for an MVP

MVP

The MVP is the first attempt to actually build a production version of a prototype. Some teams may want to get started fast by taking the prototype and modifying it, but it’s important to remember that to be viable, the code built in this phase has to meet all the robustness and reliability requirements of production software. That might not be possible with the prototype code, and I personally encourage throwing it away and starting again.

Teams will need to consider technical details like hosting, code sharing, developer access, analytics, logging, failure detection, as well as getting designers, user researchers and product managers to produce a working product.

Funding

The outline business case from the prototype phase should recommend a project funding structure. This should include the breakdown of whats needed for the MVP to get to a private beta.

How does work come in

The prototypes generated at the end of prototyping should be demo’d in show and tells and the outline business case considered by the organisations funding models. If the business case is approved, then the project will start.

Internally the team should have an “iteration-0” or “inception” process whereby they discuss the features and use a process such as MoSCoW to find the minimum feature set. This initial work should produce a backlog of “epics”, and the team can start estimating those, and breaking them down into actionable stories for the team.

What happens after

An MVP could be found to be unviable. When put in front of users during user testing, one of the fundamental hypothesis of the discovery and prototype could be found to be fundamentally flawed. If this is the case, the project should be stopped and the discovery or prototype documentation should be updated with this information.

If not, if the MVP is proven successful, then the full business case to roll the MVP out as a full service should be built and approved. This should happen while the MVP is being built, and if prioritised, the team should continue on from the MVP.

Build

This is about building the full service. The MVP may well have entered a private beta, but if not, it should at this point, so that the team can gather feedback on what is working.

The team will continue building and iterating on the product while simultaneously operating the product and ensuring that it is stable, secure and successful

Funding

This should be project funding, for larger projects over several years, all covered with a full business case.

Team Structure

This is your typical delivery team, so all the skills are probably necessary

How does work come in

The Epics that were deprioritised during the MVP should all be on the backlog and for the primary product delivery process. However the team should be gathering feedback from the users using the service and creating new work to iterate on what they have built. That learning should be shared with the team and the backlog should be regularly re-examined and prioritised.

What happens after

The service goes live, the crowd erupts with applause and everybody gets a drink…

No, seriously, the service is live, so the estimate is that the bulk of the delivery is done. The project should end and leave in place a service and a service management team of some form (more next)

Live

During live, the service is running, and you need to be able continue to support it during operational incidents, but also to maintain it, improve it and continue to review feedback

Funding

Continual operation fund and money to pay for the continual improvement

Team Structure

This is the real “it depends”. It really depends on the size of your service, and the size of your organisation.

There are two primary models for in-house maintenance teams.

Firstly you can have a team in existence to continually maintain the service. This is similar to the GOV.UK model, the CMS is never done, there will always be more than enough work to fund an entire team to maintain the system. What you save on RFQ’s and change request charges with your outsourced model you replace with salary costs for keeping this team in place.

If your service is small and the user need doesn’t change very much, then your service might be “done”. It will need maintenance, and there will be small feature requests, but probably not enough to require an entire team. Instead in this structure, you build a maintenance team who maintain a number of services in their portfolio. This team will need to be able to review the analytics, respond to user requests and patch and maintain the software, infrastructure and system in perpetuity.

How does work come in

The service will continue to produce analytics, user feedback and should be subject to user research still to ensure that it is meeting the users needs. Their context changes over time, and the service may need to change with it.

Additionally, the software will begin to age immediately, and will need continuous refreshing to ensure that it is maintained and healthy

What happens after

If the service is no longer needed, you can kill it, salt the ground and start all over again 😂

Getting started in programming with Advent of Code and Python

2018-12-03T10:47:29+00:00

Have you always wanted to program? Have you been interested in the dark and mysterious ways of development? Maybe you’ve done some reading, done a bit of practice, but haven’t been able to find the motivation or the right kind of thing?

For me, development and programming is all about problem solving. I love a good problem, and I like thinking of ways to solve them. One of the reasons that I don’t program enough, apart from being a manager these days, is that finding arbitrary problems to solve isn’t very easy.

That’s why I love Advent of Code. It’s an annual challenge that pits you against a set of programming problems, but each one is scripted against a Christmas themed story. There’s a community of people who approach these problems for a variety of different reasons, from wanting to be first on the scoreboard, to wanting to learn how to approach problems in a brand new language.

I like to do them because the problems are somewhat abstract, but also grounded in a fun story. They are often slightly mathematical, in that there are some very good ways to solve them efficiently, but most of them can be solved in very naive ways as well, which is great for learning and practicing.

So, assuming you’d like a go, I’m going to walk through how you might get started this year. I personally like to use the programming language Python to solve these kinds of problems. It’s simple to learn, it has built in most of the functionality you will need, and it’s preinstalled on a Mac, so there’s very little fiddling around to do to get started.

So lets go

Install anything extra you need

You are going to need 3 things to start day 1.

A text editor
Python installed
A github account

A text editor

You can choose a text editor of your liking. On a mac, you can press Apple-Space and type ‘textedit’ and you’ll get the built in text editor. On Windows you can use notepad.

While textedit or notepad will do the job, you’ll find life a lot easier if you install a programmers text editor. Why? Because they understand that you are trying to program and that’s different to just typing normal words.

Python code in TextEdit on the Mac

I’d recommend using something like Atom or Microsofts VSCode. Both are completely free, simple to install and both come out of the box with something called “Syntax Highlighting”. That means that it will put any reserved words or symbols into a different colour.

The primary help for a beginner with syntax highlighting is that it will warn you if you forget to close a bracket or a quote mark somewhere, because all the following text will be a single colour. It’s very helpful, so go to one of those sites and install that software, it’s simple and easy.

The same code in Atom. See the colours showing strings, special words etc

Both should install just fine on windows or Mac (I use a mac, so the windows instructions in here won’t be as good I’m afraid)

Installing Python

Next we need python installed. Luckily for Mac users, python comes pre-installed on a mac. Press Apple-Space and type “terminal” and you should get a terminal window come up. In that window type “python” and you should see something like this. (This is called the REPL)

Python on the mac, ignore the /usr/bin bit

At this prompt you can just type your code, and see what works. Type 2+3 and it should print 5 for example.

Hold down the ctrl key and press D to quit the REPL when you are done

If you are using windows, go to the Python website and follow the instructions

While you can use the Python REPL, it’s not recommended. What we do most of the time is type our commands into a text file, and then run the file using python. This makes it easy to correct the code if you make a mistake, which you will do, and run it again.

Every developer is different on how they arrange their code, so you need to work out what works for you. I personally have a directory in my home directory called “work” and in there is a subdirectory for each project I work on. I created a directory for the advent of code 2018, and then I create a directory for each day.

But before we get there, we should discuss version control.

A github account

In order to get into the advent of code you need to sign in. They let you sign in with Gmail, Twitter and your Microsoft account if you want, but you should take this opportunity to get yourself a github account if you don’t have one already.

Github is where almost all developers in the world store their code. There are alternatives, but it’s a good start. If you ever want a job as a developer, a lot of places will look to see if you have a Github account as part of the screening process.

To sign up, go to Github and hit the sign up button. Come up with a name for yourself. Remember that this might be your professional identity in future, so keep it clean. For years I was MIB because I only wore black, but as I realised that this would be professional, I changed my online presence to be bruntonspall everywhere. It doesn’t have to be your name, and for most people probably shouldn’t be.

Now, we have a choice at this point. I would recommend keeping all of your code on Github and regularly pushing and storing it there as you go. But I don’t want to cover using Git and GitHub in this tutorial, so if you want to do that, go read something like this or get a friend to help.

If you are using github, then create a repository called AdventOfCode2018 and then clone it into your work directory. If not, from that terminal, type mkdir work to create the work directory, and then cd work and mkdir AdventOfCode2018

Also, go to the Advent of Code website, sign in with Github and click day one and you should be able to read the problem

Lets get started with Python

Right, we’re going to program now. Before we start properly, I’m going to suggest the way that I structure my code. There are other ways of doing this, but this works well for me for this kind of coding problem.

Firstly, I use something called unit testing to explore the problem. Python comes with a module of code called unittest which does most of the hard work for you, and makes it easy to describe tests.

I write tests for my code before I implement them, and then I run the tests to be sure they fail for the right reason. As I go through the problem, everytime I encounter a bug or weirdness, I create a new test to document what I think should happen.

Lets take day one as an example.

We have been given a set of numbers to sum up. These look like +1, -2, +1 and we need to add them all together and get the correct number.

So I created my first test to look like this

def test_basic_solve(self):  
    data = [1, 1, -2]  
    self.assertEqual(0, puzzle.solve(data))

Ignore the syntax and how it looks, the critical thing here is to see how the test is structured.

First I create the data in the form I want. (We’ll cover the syntax in a bit)

Then I use a method called assertEqual to compare my expected result 0, with the result of calling my code puzzle.solve(data).

If my code didn’t account for negative numbers correctly, then this would fail my test. If it didn’t count correctly, it would fail the test, and so on.

For my first try, I made the puzzle.solve just simply always return -1, so my test failed. I can now write the code in puzzle.solve and know that when I get it right, the test will pass

I run the tests by typing python test_puzzle.py in my terminal, and it shows me which ones have passed or failed.

Structuring your code

Python allows you to structure your code into 3 different concepts, modules, classes and functions. I use all three in my template code, and I’m going to explain how and why that works.

Firstly, in python a file contains code into a module. If I create a file called myfoo.py that has some code in it, then in another file, say mybar.py I need to bring that code into scope to be able to call it. I do this by putting an import statement at the top of my code. import myfoo will bring that code into scope for my file.

Inside a file, code in there is run when the file is either called by typing python filename or when the code in a file that is run that way says import myfoo.

Normally we don’t want to do that, so we bundle our code up into functions. A function is a piece of code that takes some arguments and normally returns a value. It’s a good way to divide up your code into neat segments that do different things.

In python we define a function using the def keyword, and the format looks a bit like this:

def my_func(a, b):  
    return a+b

In this case we’ve defined a function called my_func. We’ve said it takes two arguments, which we’ve called a and b.

When the function is called, you can essentially assume that we replace a throughout the indented code with the first argument and b with the second argument. So if we call the function by writing elsewhere my_func(2,3) then a would be replaced with 2 and b replaced 3.

Return is a special keyword that says stop processing the function and return back the value on the right. In this case, the sum of a and b together (or 5).

Finally classes. I don’t use classes in Python very much for this kind of coding because we tend not to be building systems that are complex enough to need them.

But the testing framework requires one to exist, but I’m not going to explain it, just show you what I do.

Here’s puzzle.py

def parse(lines):  
    return None

def solve(data):  
    return -1

Here’s the test_puzzle.py file

import unittest  
import puzzle

class TestBasic(unittest.TestCase):  
    def test_pass(self):  
        data = puzzle.parse(file("input.txt").readlines())  
        answer = puzzle.solve(data)  
        self.assertEqual(0, answer)

if __name__ == '__main__':  
    unittest.main()

This is my basic template, I start each days puzzle with this code.

The puzzle.py defines two functions, because I split my solutions into two main parts. The first is to parse (that’s read and understand) the input file into a form or structure that I can use.

The second is to solve the puzzle with the input structure.

In the test code we do 3 things.

Firstly we import the unittest module and our puzzle file into scope. Where we say puzzle.something that will call a function in the puzzle file

Next we create a class to hold our tests. This can just be copied as is, but essentially it declares a class that inherits from the unittest.TestCase class, which the unittest framework uses to provide some features such as the asserts, and also to magically find our tests

Inside the class, we create a function called test_pass. It takes a single argument, which you can ignore, but that “self” is a standard python thing, we’ll use it in a minute

First we call the parse function in the puzzle module, and we use some of pythons built in modules to read the file and divide it into lines. If you create a file called input.txt and put some lines of stuff there, then that’s what would get sent to the parse function.

We take the return value of the parse function and call it data.

We then pass the data to the solve function and take that return value and call it answer.

We then compare the answer to 0 to see what it is.

Finally at the bottom of the file, there’s some code that says if this file is being run by calling python <filename> then run the unittest frameworks main function. It’s this that finds and runs all my tests

If you have those files, and create a file called input.txt (which can be empty) and type python test_puzzle.py it will run the tests and should fail, as below

This failed because we said we wanted to get a 0, but the puzzle.solve function returns a -1

I store all this code in a folder called ‘template’ and I simply copy it into a folder called ‘day1’ to try to solve day 1, and repeat each day. This makes it nice and easy to start again each day

Solving Day 1, Part 1

Ok, so the device starts with a frequency of 0, and we get given a list of numbers +1, -2 etc, and we want to know what frequency we end up on.

We could just model this exactly as described, take each number and apply it to our current frequency, and that’s probably the easiest way of doing so.

Mathematically, hopefully you remember that adding a negative number to something is the same as taking away the number? That means that (2)+(-1) should equal 1. That means we don’t need to do anything different for the + or -letters, we just need to turn them into the right kind of number.

This is what I mean by separating the puzzle into parse and solve parts. Our first problem is to take a text file filled with “+1, -3” and turn those into numbers that we can do maths on.

Python has a set of handy functions for this stuff, called the Python Standard Library, and in this case if you click the link to “numeric types”, you should see that there is a function called int(x), which is defined as “x converted to integer”.

Lets’ test this theory with a simple test,

def test_basic_parse(self):  
        data = puzzle.parse('''+1  
        +3  
        +2'''.split())  
        self.assertEqual([1, 3, 2], data)

If we add this code into our testcase (being careful around indentation, it should be indented one step from the class, the same as the other def test… code.

This is going to try using the parse function on the string

+1  
+3  
+2

Note that the ‘’’ characters start a “multiline text string”, which is important as it remembers the line breaks in between each character. The function expects a list of lines, so we need to call the split function before passing it into the parse function.

We pass that information to the parse function and expect back a list of the numbers [1, 3, 2]

If we run this test, it should fail and tell us that parse returned None, which is right, because we haven’t written that yet!

So let’s write that code shall we?

def parse(lines):  
    ans = \[\]  
    for line in lines:  
        ans.append(int(line))  
    return ans

We’re going to do this nice and simple. First we create an empty list. We can add things to the list using the append function.

Next we step through each line in the lines of data passed to us, and we turn it into a number and add the number to the list.

Let’s try running that test.

If we did this right, you should find that the test passed, but the original test is still failing. In unittests, you’ll get an output at the top, with passing tests represented by a . and failing tests with an F, so .F means that one passed and one failed.

So now we can turn the file into numbers, we can start doing maths on them!

Our solve function is going to be passed the list of numbers and needs to return the end frequency.

We could do this by running the real one, but let’s go slow. The AdventOfCode gives us some samples with known answers, so let’s try one. Add the following code to the test_puzzle.py file

def test_basic_solve(self):  
    data = [1, 1, -2]  
    self.assertEqual(0, puzzle.solve(data))

Re-running the tests should give you 3 failing tests, so let’s see if we can write the solve function.

We want to step through each number and add it to the total so far. The simplest possible code that can solve this is very similar the bit we just wrote:

def solve(data):  
    total = 0  
    for num in data:  
        total = total + num  
    return total

Now we’ve got a passing sample test, and our final test is showing something interesting. What’s that 599 answer? Well it’s the answer for my version of the code problem. Your number will be different because the Advent Of Code wants you to solve your own problems, not just look up the answers.

If you type your number into the Advent of Code website, you’ll have solved the first day!

Also, make that test pass, by writing the number 599 into the test, and you are good to go.

Part 2

I’m not going to solve day 1 part 2 for you, but it’s a more complex problem to solve.

I recommend starting by using a piece of paper and seeing how you would detect that you’ve hit the frequency a second time. I think the most likely thing is to keep a record of each frequency that you get to. A python dictionary with the key being the frequency and the value just being a simple “True” value should enable you to detect if you’ve seen that frequency before and promptly stop processing.

You’ll also need to have a go with the samples and make sure that your code will loop round the values more than once.

Best of luck, and I hope you enjoy it

You can get the code I’ve written over at my github page, although don’t look ahead to problems you haven’t solved. You can see people discussing the problems on the Advent of Code reddit page, and ask anyone you know to help

The basics are sometimes the hardest things

2018-11-07T23:06:44+00:00

If only we could apply patches, then we could do more interesting security work.

I’ve as guilty as the next person of over-simplifying how easy the basics actually are. We exhort and condemn organisations and people for not getting the basics right. “Having a company vision is just table stakes”, but we forget how how hard it can actually be to execute the basics well.

If we take patching, it kind of sounds easy to just apply patches to our software. But to do that we need to know what software we have deployed, and the moment you scale up from one team to 5 teams, you are going to lose that visibility.

Even if I can keep some visibility of what systems I have, I also need to pay attention to what software I’m running, what version it is, and what patches are actually available.

With multiple service delivery teams, each using a slightly different technology stack, we now need to track whether we are patching ruby, node.js, linux, windows, vmWare, Xen and a whole host of other software.

If we add in the complexity of a typical corporate estate, desktops, email, productivity tools, mail clients, instant messenger tools, collaboration tools as well as whatever else has been installed to help our service teams deliver, patching turns out to be a lot harder in practice than it sounds as a trite soundbite.

We should patch early, and patch often, and we should aim to get the basics right before we start investing in the machine learning quantum advanced threat protection that the vendor wants to sell us, but we shouldn’t deride people for finding it hard.

[This blogpost is part of an attempt to blog once a day for the entirety of November (#NaBloPoMo), inspired by Terence Eden. This series of blogposts are therefore unedited and written on the day. Errors and Omissions Excepted]

Nudge or strategy?

2018-11-06T20:49:23+00:00

Should you nudge your users into better behaviour, a small microaction at a time or do you need large sweeping changes to change behaviours?

This debate seems to have gone on for decades, in all manners of locations, from healthcare to technical architecture, economics to the adoption of devops.

I tend to favour the former, preferring to use nudge economics to slowly change direction for teams, but recognise the need for the latter sometimes.

But it seems like sometimes people are zealous adherants of a certain method and refuse to acknowledge that the other way might work.

In the conversation around password complexity, nudge zealots argue that by reducing the complexity of passwords, you increase the ability of users to remember them, and create better user behaviours for password hygiene. Whereas some people yell that decreasing password complexity even with increased length decreases the overall entropy in the system.

There are times where you need both methods in your toolbox, where you need to know when to apply which strategy.

Due to my background, I’ve always likened this to the problem of Path-finding in AI in computer games, and what’s called the hill climbing problem.

In this system, we have a map which has varying elevations, green being lowest and red as the highest.

A map with some hills

A very simple algorithm is to look at the neighbouring cells and if you find cells that are higher than you, you can go there, if not, pick one of the cells at random and go that way.

However this will result in a problem on the map above, which is that anybody who starts on the right of the small hill will climb to the top of the small hill and go no further.

A better algorithm might be to search the map for the highest point, and flow down hill as much as possible while reducing the distance from the current point to the start point. This would result in this map

I view nudge actions as very similar to the first algorithm. They’ll get you so far, and will get you moving. They’re easy, efficient and simple. But they wont get you to your ultimate goal.

The strategic route takes more effort, more time and more thought, but it will get you to your ultimate goal in the end.

But really you need both practices in your toolbox, you need to be able to action immediately, and nudge your team to better practices, but you also need an eye on the strategic options open to you and know when to take more severe action.

Language forms mental furrows

2018-11-06T01:14:10+00:00

When I worked at GDS, I worked with a lot of people who got very specific about their language. We talked about users, not customers; user needs not requirements and clear plain english where possible.

I was never very good at this, I have a tendency to use 20 words where one will do, and I’ve never been terribly concise or consistent about my language.

But the language that we use on a daily basis really matters. Why?Because language forms the habitual furrows into which our thoughts get organised.

If I am asked to write a resource utilisation plan for my development team, I’m likely to think of my staff as fungible resources that can be re-allocated at will. If I talk about the requirements for a system, then I get focused on producing the best requirements of the system rather than asking what the users actually need from the system.

This isn’t always true, there are many gifted and talented people that I’ve met, who are perfectly capable of talking about a resource plan, meaning their staff, and keeping in mind that Karen is a member of their team with opinions, hopes and dreams and not simply a Java programmer. But the reality is that when faced with a problem, many of us try to solve it, and if the problem is phrased in a certain way, we’ll tend to solve it with a certain mental headspace.

Even worse, this becomes habit over time, so this language forms habits for us. Asking “what is the user need?” becomes a knee jerk response to someone who says the word requirements, regardless of their intention or meaning.

If you want to break yourself of this habit, then the first approach to many problems has to be to rephrase the problem. Instead of solving the problem set in front of you, ask yourself how to rephrase the question and see if that helps shift your thinking

Organisations are made of people

2018-11-04T22:23:01+00:00

Does Google value your privacy? How about Facebook? Your bank cares about you we are told.

We often forget that organisations are not entities in themselves but are a contingent of people acting in broadly the same way.

When I worked for GDS I was often asked what “GDS thought” about a specific topic. What is GDS’s position on serverless? How about single page applications?

GDS as an organisation produced a service manual, which was the result of much internal discussion, but was also in many sentences, the result of a single person expressing their views and waiting for challenge.

An organisation cannot really hold a view, and cannot have a position. It’s leaders might consider that the organisation stands for something, but in reality, organisations are made up of highly varying people all with different views.

When you next talk to someone claiming to express their organisations view, or know what an organisation would say, you need to ask yourself whether they are simply expressing their own views, reinforced through their perception of the organisation.

Build and Deploy are different concerns

2018-11-03T23:39:41+00:00

You’ve spent days crafting the perfect bit of code, and you are ready to put it in front of real users.

You hit the build button, and some process takes your code and all the tests and makes sure that it works. In the same place, you hit the deploy button and the code is compiled and shipped to a production machine and everything works just fine right?

It’s 2am, and you are on call. Someone is ringing. You pick up and are told that the site search system isn’t working anymore. Nobody has changed this in months, but something isn’t working. You log in and look at the graphs. Memory use and disk use has been steadily climbing for months, and now the service can’t run. You restart the service but it doesn’t help, it just reloads the indexes from disk. In a fit of pique, you look at the changelog, and yup the last deploy matches with the steadily climbing memory usage, so a rollback will fix your problem.

You hit the deploy button to redeploy the app, but the deploy fails. During the deployment process, the system tried to download an old dependency that doesn’t exist anymore. You search fruitlessly, but trying to roll forward the dependency means branching the code, and the updated version of the library has compatibility issues.

This shouldn’t ever happen. A deployment shouldn’t require rebuilding the thing you are deploying. It should be constant and require no external dependencies.

A build process is exactly what it says, designed to build a relocatable binary package of some form. In the C and C++ days, this meant doing the compile and the link phase, to produce an executable.

But in Python or Ruby, JVM’s and .NET, the blurring of the boundary between compile time dependency and runtime dependency has been blurred. Do you need libssh installed by the operating system or do you do pip install -r requirements.txt at deploy time?

I’d argue that building a relocatable, redeployable artifact is the job of the build process. The only communication between the build process and the deployment process should be the creation of the artifact, and the registration of the artifact metadata.

At deploy time, the only action should be to take the artifact and put it into the running system. It doesn’t matter whether this is to rsync over the php and NOHUP the server, or transfer a self executing jar and run it on the server. From docker containers to go executables, elf binaries to zip files of code, the deployable artifact should remain inviolate for its entire life.

The reason for this is that deployments should be repeatable, consistent and reliable each and every time. Any build step that gets put into that deployment tends to break that model, and cause deployments that cannot be relied upon.

Michael Brunton-Spall on …

Using AWS Workspaces to control access to documents

Locking down the data

Authenticating the users

Create a certificate authority

Creating a client certificate

Getting OSX to use the certificate

Addendum: A better approach for MacOS

What is legal basis under GDPR?

Consent

Contract

Legal obligation

Vital interests

Public Task

Legitimate Interests

Brexit and data transfers

An example organisation

What is a data controller and data processor?

Offshoring

Current legal status

What is a data transfer?

What about the US?

So known unknowns?

What if I move my data to AWS UK region?

Discovery, Alpha, Beta, Live… Part 2

TLDR

The Phases, Team and Budget structures

Discovery

Prototyping

MVP

Build

Live

Getting started in programming with Advent of Code and Python

Install anything extra you need

A text editor

Installing Python

A github account

Lets get started with Python

Structuring your code

Solving Day 1, Part 1

Part 2

The basics are sometimes the hardest things

Nudge or strategy?

Language forms mental furrows

Organisations are made of people

Build and Deploy are different concerns