Porting a Node.js Module to Rust
As part of my Rust learning process, I decided to take an existing personal project that was being used in some capacity and port it to Rust. The project I chose was a Node.js module that syncs the NIST CVE data feeds into a local cache so that any CVE ID can be supplied and the details can be returned to the caller. The project was originally conceived while participating in the pkgsrc-security team's vulnerability triage rotation. I had experienced an outage of the NIST website during my shift that prevented me from looking up CVE details for vulnerabilities. Without that information, I couldn't accurately determine if the vulnerabilities in our feed impacted Pkgsrc packages. Not an ideal situation! Thus, node-nvd-search-{cli} projects were created and put to use for any future outages and also just for quick CVE detail lookups.
The module is fairly simple, it uses the published NIST CVE
feeds and their associated metadata to sync the feed's JSON files
locally and update them when needed. Then a JSON streaming
parser is used to search them by CVE ID. If a match is found, the
full CVE JSON details are returned.
This module eventually found itself useful in some vulnerability
management automation efforts at $WORK
too and gets
exercised daily as part of Trivy
scans which sync vulnerability information into Jira.
(I'd love to open source this utility which is affectionately named
"CVEzilla" but it's tightly bound to our internal processes and not
readily useable by others in its current state.)
As part of the port to Rust, I also wanted to try storing the CVE details in SQLite. I had intended to do so for the Node.js version some time ago, but (at this time of writing) the SQLite module is sadly not well maintained and so I have been hesitant to use it. Aside from adopting SQLite, I also wanted to combine the library and CLI into a single repo, in hindsight it didn't make much sense to split them up into separate repositories.
It is worth noting that I also looked into using sled, which is an embedded key-value database and would be more than ample for this project's needs but some unanswered questions around patents scared me away. Sled's README also stated:
if reliability is your primary constraint, use SQLite.I truly appreciate the honesty. It also doesn't hurt that I've had good success using SQLite in other projects in the past.
After having my war plans laid out, I started working on the new Rust version and had something working after a couple weeks of evening and weekend hacking. I used reqwest to fetch the feeds in batches just like the Node.js version. I even had some tests, much like the first project too. I also added the ability to choose a feed mirror. Overall things came together pretty quickly.
The challenges I experienced during this port weren't so-much due to Rust as it was to the changes I wanted to make in the functionality. My first attempt at syncing the feeds locally into SQLite took 48 minutes. By comparison, the Node.js version only took about 30 seconds. This surprised me but I was switching from blindly streaming a file to disk, to parsing a gigantic JSON document and performing thousands of inserts. I anticipated some challenges in this area.
The fix for this issue turned out to be surprisingly simple: use transactions. I didn't realize how critical transactions were for performance, but it shaved the full sync time down to 2 minutes which was a better place to be in as a full sync is usually only done once after the first install. Future syncs just fetch any updated feeds and only take a few seconds. The current sync time is around 20-45 seconds on a reasonably modern machine running NetBSD or Linux. That is still slower than the Node.js version but searching is significantly faster, and there's still room for improving the initial full sync time.
When experimenting with how many async tasks to throw at feed fetching I noticed that more async tasks didn't really speed up things by much. In fact, I ended up removing the async code entirely and barely noticed any difference. I did notice that the code got a bit simpler, so I stuck with reqwest's blocking API. I intend to make choosing between async or blocking a cargo feature but have not implemented it yet. More async tasks also meant a much higher peak memory consumption as multiple, gigantic JSON blobs were deserialized into memory so they could be iterated over and inserted or updated into the database.
High peak memory usage is still one of the last things I'd like to improve as is the module API. But as it stands now, it works™ (even on Windows) and I enjoyed the challenges and porting experience. I'll continue maintaining the Node.js version as long as it's being used too, but it likely won't add any new features.
You're welcome to try out the project for yourself at the links below: