Open source software is everywhere. From your server to your fitness band. And it’s only becoming more common as over 90% of developers acknowledge using open
The further down the line we discover a software defect – the more it costs to fix and recover from it, whether it’s a bug that leads to visual error like a spacecraft crash or a more day-to-day bug, human error, or plainly scale issues, that leads to a service being down.
As an industry having to ship software faster and faster each day with limited attention span as a huge, fast moving organism — it has been a slow realization: with the growth of “software eats the world”, software defects — and more critically security defects — are becoming a bigger problem as well.
Security defects have a unique trait built into them — they are always searched for, probed and exploited by an adversary — as long as you have something worth exploiting. The cost for a security defect and the probability that the worst-case scenario will actually happen, unlike a traditional software defect, could be very real and unfortunately be very big.
This is why, in a similar effort that was invested to tame software defects, the industry as a whole is looking to tame security defects, or security vulnerabilities at the start of software production and also in the design phases, or for a more popular term — shifting left security.
Shifting left security means adoption and adaptation of very much the same processes, practices, tools and capabilities that exist in the parallel world of “traditional” software testing and mitigating defects into security testing and mitigating security vulnerabilities.
The big game-changes that we can borrow from the testing world are:
The premise of DevSecOps and shifting left security is the same: align security with the rest of the software development lifecycle.
The immediate gains of shifting the discovery of security defects left are:
In the general sense, we’ve already seen one of the most disruptive changes to our industry already – the move from Ops to DevOps.
For someone looking to build developer-first security tools, or integrating them, or is through the process of selecting and designing their own solution based on those we need to answer the following:
The criteria or checklist that we need revolves specifically around:
One of the popular criteria for determining the fit of a security tool to be a developer friendly tool is the question: “does it work in my CI?”.
Many security tools have CI integration challenges that stem from their design. A funny viewpoint of this, is where Abhay Bhargav talked about the phenomenon of security tool narcissism —
“Run MY tool, see MY dashboard!”
This self-centric and developer-unfriendly approach inverts control over where you want to exercise your guards. Sometimes developers just want to run a tool. Period.
The fact that you need to fiddle with a UI to accomplish a task — destroys the chances of shifting left security. Want to run a scan? Use the UI. Want to see results? Log into the system. Want quick feedback? Log into the system, see status. These ALL are bothersome, and miss the point of building to the zen of development.
In addition some of these security tools have inherent security design flaws when running against your securely kept assets. Whether it’s at the code analysis stage or deployment stage (such as a building a Docker container ready for deployment), a tool that moves your assets (such as your code) off-perimeter, essentially copying everything to a remote system that doesn’t belong to you for the sake of a security analysis — is painfully dangerous: if you had a security issue (such as exposing sensitive customer data, credit cards, or secrets), it now exists in two physical places, one is yours, and one is stored at a vendor that is not you.
Many, if not all CI systems meter by the minute, and some of them have OS specific multipliers. Github Actions for example, as of the time of this writing — bills using 10x times multiplier per minute on macOS than on Linux. If a tool takes a long time to run, it may become ineffective and expensive to run.
CI is all about creating the primary feedback cycle for a team. The major concern is the speed of your feedback cycle for a given team. CI workflows tend to start lean, and as the team and codebase evolve, they become heavy and long running.
If you went through this process once or twice in your career of watching a fast CI workflow become dead slow — you learned to keep in mind that every additional tool and process affects CI job time.
When building a CI friendly security tool you think about investing in a technology that can run quickly, and keeping the overall CI times in check.
Fancy, cryptic, hacker tools exist only in movies. In real life, when we hit a problem we want to understand it in simple terms and fix it in simple terms. This means:
When running in my CI, don’t copy my assets off premise, don’t phone home without my knowledge, and don’t send my data anywhere else.
In addition, don’t break my existing security guards such as:
CI systems support a test matrix and using it is one of CI’s best values: test the same thing over a matrix of possibilities – quickly. This is exactly the thing a developer cannot do reliably on their machine due to hardware and existing software differences, or wouldn’t want to invest time in (I certainly won’t).
To support the benefits of such a test matrix, if you’re building or adopting a tool for your CI, it should be able to run on many operating systems, and many architectures (did someone say ARM?).
It is almost unbelievable that JUnit test reports are the standard of reporting in all modern CI systems; JUnit being the archetype and workhorse of modern unit testing frameworks, it didn’t disappoint here as well.
A tool that behaves well on CI, should be able to output a JUnit format output even though it doesn’t really run unit tests. The format is so flexible so that every form of testing can be baked into it — including security testing.
With that said, for security-specific reporting, we now have the relatively new SARIF format for static analysis results (stands for: Static Analysis Results Interchange Format). A rather new but powerful format to generate output for, with the primary consumer at the time of this writing – Github.
A CI system being out of your reach sometimes (such as a cloud CI service) may require a deeper look into a run session. In that case, a tool should be able to output deeper reports, more data so that it can be handled offline — for debugging or understanding the interaction between the CI system, the asset under analysis, and the tool doing the analysis.
For example, a CI system that uses containers as a virtualization mechanism, might change its base image without your notice or attention, and suddenly things will start breaking in surprising ways.
Often you’d want to compose new workflows in your CI. Most CI systems already take into account streaming and exit codes, and so the expectation is that developer and CI friendly tools would behave in an expected way.
When you’ve found an issue – an exit code should exist to signal an error state. Examining exit codes is a basic step in composing processes.
A well-behaving developer tool should have as little dependencies as possible. Ideally it should be a self-contained binary. Even though you can build container images that wrap everything, and you can set up CI processes with all the required dependencies, these break as dependencies shift, as CI systems evolve, and as your code and requirements change. You don’t want to be investing time in CI set up every time one of those changes.
Don’t surprise me. Principle of least surprise in short here:
“People are part of the system. The design should match the user’s experience, expectations, and mental models”.
For a developer tool context; one of the utmost concerns is for a tool to be always accessible and behave exactly the same as was the last time you’ve used it; given that you use other tools in your toolbox from time to time (but not all the time).
An old flag is missing in the new release, or it breaks because a feature was removed. Or even, try to use it and it now decides to auto-update — and you’re staring at your screen now (looking at you, Windows!). This all impairs productivity, which if you take any given developer time-stress these days — is already impaired by the fatigue of the likes of Slack, email, JIRA, information overload, tech complexity, bad project management, and more.
PS: This excludes the special and important case of critical updates and security updates — which in any case, I expect to be transparent, non-breaking, non-removing and non-surprising updates.
Don’t force me to use a UI to perform an action on my way to getting my job done. Put these steps in the tool, in the CLI (e.g. register, login, authenticate, main value, reporting).
Imagine if you wanted to publish a package on NPM, but to do that you had to leave your terminal, go to your browser, log in, copy some kind of token back, set up a configuration file, and then get back to publishing; just to realize its the wrong account you’ve logged into on your browser because you have your non-work profile there.
Nope. You log in through your tool that you’re using and all that happens automatically. Don’t force me to use a UI means — don’t force me to context switch in any case on the way to getting value from your tool.
An exception in this case would be an MFA login, which by-design forces you to add another factor and then possibly context switch to your phone or any other form of authenticator that you might have.
Support all operating systems. I don’t want to set up a VM for you, I don’t want to run in docker. Cross compilation is easy and cheap — you just have to pick the right programming language for it.
Some tools and programming language cargo cult their platform gaps. Whether it’s tools that rose to fame on Windows (SDR, Software Defined Radio tools have their best version only on Windows), or on Linux. As a developer, I’m using all these platforms with the primary one being my mac.
It’s not uncommon for my best tools to go with me everywhere; the tools I highly appreciate — I’ll use on my mac, Linux and Windows development workflows.
Have great docs for your tool, in addition to having great docs for your product.
Developer documentation should have a product manager too (that might be you). The times of man-pages and “dry” docs are gone. Treat your docs like a product, and use JTBD to gather requirements for it. If you use this same model, you’ll realize that you want code samples in your docs, and troubleshooting advice in the same paragraph.
When you’re authoring docs to display everything you’ve got, as opposed to letting someone perform a given task with the help of the text you’re just writing now — it makes a whole lot of difference.
When I’m not in the mood for reading the docs, tell me what’s the right next step in the CLI itself. Also –help should do what you would expect.
The pit of success principle means, that if i’ve learned to do the first step with your tool, this step should teach me what’s the next reasonable step to perform. A tool should not only do a certain job, but it also has the role of an educator; such an educator that I get only the right information, on the right time.
Nielsen has put 3 important limits for usability engineering:
0.1 second is about the limit for having the user feel that the system is reacting instantaneously, meaning that no special feedback is necessary except to display the result.
1.0 second is about the limit for the user’s flow of thought to stay uninterrupted, even though the user will notice the delay. Normally, no special feedback is necessary during delays of more than 0.1 but less than 1.0 second, but the user does lose the feeling of operating directly on the data.
10 seconds is about the limit for keeping the user’s attention focused on the dialogue. For longer delays, users will want to perform other tasks while waiting for the computer to finish, so they should be given feedback indicating when the computer expects to be done. Feedback during the delay is especially important if the response time is likely to be highly variable, since users will then not know what to expect.
Out of which, with developer tools, you want to adopt the 100ms of UI responsiveness limit. Where it is not only critical in graphical interfaces, but it is also important in the much neglected realm of developers — the terminal.
There’s no denying that when you use a fast tool it makes this fuzzy impression on you that you want to keep it. Speaking of fuzzy — fzf is such a great example (no pun intended), it’s one of the fastest fuzzy searching tools that I keep using as a replacement for the standard history feature in shells.
Startup time makes a difference, and run time makes a difference. You can’t use Java for CLI tools; you can’t bring up a mammoth VM for a quick ride (not counting GraalVM and latest Leyden), and then it may not even get to JITting at all.
Developers have a few places to get great, fast feedback; choosing which to use depends heavily on the nature of the analysis we want to do, and if the security tool can allow running under that specific context (some security tools can’t run on a dev machine because they’re too heavy).
Common to the different feedback cycles and their speed, is the following:
This would seem as the fastest feedback cycle a developer can have. Sometimes it is — you get errors as you type, and indeed many code linters, which are a different type of analyzers do this. Some other times, this might be something we need to skip in favor of a bit slower feedback cycle; most commonly when someone is working and the work isn’t “done” yet. In other words — “I know I’m not ready, please don’t bother me, clippy”.
In this case, it’s more appropriate to approach the feedback cycle just as you’d approach development unit tests. In the same way unit tests run, you can run security tools.
Only one big deal requirement here — run fast. This may become a big requirement for current generation tools based on heavy AST data flow analysis, and for some an impossible requirement as it may take these hours to complete a run. Even a few minutes may be too much.
Other than unit tests, there is the staging step. In this step developers are reviewing their work and ready to stage the work into a codebase to be shared with others. In this staging step, we can use a git-hook. A pre-commit hook will trigger evaluation of the material that’s about to be committed and possibly block a commit if there is a violation.
This about ends the story on the developer machine. The next step is to create a feedback cycle on the CI system itself. We’re in the developer realm but it also means such a given feedback cycle is now in the — several minutes to – tens of minutes time frame –.
We’re also taking two new risks:
Whatever tool you’re running early in your pipeline now, it should correlate with some concept of threat, and be able to assist in risk mitigation and its prioritization.
Does this now mean that threat modeling becomes a bottleneck? If everything runs super fast, tools are being a great citizen in your CI and developer endpoints and deliver great security value.
The answer is yes and no. Just like when software moved to BDD and TDD, and away from “QA on production”, and later ops moved to infrastructure as code (and still is moving) and away from “click to bring up a cluster”, a few roles became overloaded, and a few years-old processes needed a refresh or were killed altogether.
Aligning security processes with the TDD movement, as well as with the IaC movement, and cloud-native movement.
There is a trend here — as software understanding improves, we’re able to create more software that pushes production and design of software to the natural place that it should live in — like any other form of engineering — design stage.
Because we don’t experience “waste” in software engineering, other than the ever-illusive loss of time, we’ve never been really keen on pushing this so strongly. Compare that to a carpenter that keeps building furniture but never measures, delivers a cabinet just to see it’s crooked, and then goes back to rebuild, throwing away a lot of wood — this person sees the material being wasted.
In fact, it’s inevitable that this role of the threat-modeler is going to move to be strained. Therefore it has to align with some kind of more agile process; as critics of STRIDE already saw — we need some common sense for where and how to apply threat modeling.
And so, as you shape up your developer tools for security, you need to do the same for your developer processes for security.
Some great open source developer-first and CI friendly security tooling that you should check:
And a few takeaways if you’re planning on integrating or building developer-friendly security tools:
Open source software is everywhere. From your server to your fitness band. And it’s only becoming more common as over 90% of developers acknowledge using open
It’s easy to think that our code is secure. Vulnerabilities or potential exploits are often the things we think about last. Most of the time, our
Experiencing a data breach is never pleasant. Just ask any of the hundreds of businesses that suffered a data breach in the past year, exposing billions