IBM acquires Italy’s MyInvenio to integrate process mining directly into its suite of automation tools

Automation has become a big theme in enterprise IT, with organizations using RPA, no-code and low-code tools, and other  technology to speed up work and bring more insights and analytics into how they do things every day, and today IBM is announcing an acquisition as it hopes to take on a bigger role in providing those automation services. The IT giant has acquired MyInvenio, an Italian startup that builds and operates process mining software.

Process mining is the part of the automation stack that tracks data produced by a company’s software, as well as how the software works, in order to provide guidance on what a company could and should do to improve it. In the case of myInvenio, the company’s approach involves making a “digital twin” of an organization to help track and optimize processes. IBM is interested in how myInvenio’s tools are able to monitor data in areas like sales, procurement, production and accounting to help organizations identify what might be better served with more automation, which it can in turn run using RPA or other tools as needed.

Terms of the deal are not being disclosed. It is not clear if myInvenio had any outside investors (we’ve asked and are awaiting a response). This is the second acquisition IBM has made out of Italy. (The first was in 2014, a company called CrossIdeas that now forms part of the company’s security business.)

IBM and myInvenio are not exactly strangers: the two inked a deal as recently as November 2020 to integrate the Italian startup’s technology into IBM’s bigger automation services business globally.

Dinesh Nirmal, GM of IBM Automation, said in an interview that the reason IBM acquired the company was two-fold. First, it lets IBM integrate the technology more closely into the company’s Cloud Pak for Business Automation, which sits on and is powered by Red Hat OpenShift and has other automation capabilities already embedded within it, specifically robotic process automation (RPA), document processing, workflows and decisions.

Second and perhaps more importantly, it will mean that IBM will not have to tussle for priority for its customers in competition with other solution partners that myInvenio already had. IBM will be the sole provider.

“Partnerships are great but in a partnership you also have the option to partner with others, and when it comes to priority who decides?” he said. “From the customer perspective, will they will work just on our deal, or others first? Now, our customers will get the end result of this… We can bring a single solution to an end user or an enterprise, saying, ‘look you have document processing, RPA, workflow, mining. That is the beauty of this and what customers will see.”

He said that IBM currently serves customers across a range of verticals including financial, insurance, healthcare and manufacturing with its automation products.

Notably, this is not the first acquisition that IBM has made to build out this stack. Last year, it acquired WDG to expand into robotic process automation.

And interestingly, it’s not even the only partnership that IBM has had in process mining. Just earlier this month, it announced a deal with one of the bigger names in the field, Celonis, a German startup valued at $2.5 billion in 2019.

Ironically, at the time, my colleague Ron wondered aloud why IBM wasn’t just buying Celonis outright in that deal. It’s hard to speculate if price was one reason. Remember: we don’t know the terms of this acquisition, but given myInvenio was off the fundraising radar, chances are it’s possibly a little less than Celonis’s pricetag.

We’ve asked and IBM has confirmed that it will continue to work with Celonis alongside now offering its own native process mining tools.

“In keeping with IBM’s open approach and $1 billion investment in ecosystem, [Global Business Services, IBM’s enterprise services division] works with a broad range of technologies based on client and market demand, including IBM AI and Automation software,” a spokesperson said in a statement. “Celonis focuses on execution management which supports GBS’ transformation of clients’ business processes through intelligent workflows across industries and domains. Specifically, Celonis has deep connectivity into enterprise systems such as Salesforce, SAP, Workday or ServiceNow, so the Celonis EMS platform helps GBS accelerate clients’ transformations and BPO engagements with these ERP platforms.”

Indeed, at the end of the day, companies that offer services, especially suites of services, are working in environments where they have to be open to customers using their own technology, or bringing in something else.

There may have been another force pushing IBM to bring more of this technology in-house, and that’s wider competitive climate. Earlier this year, SAP acquired another European startup in the process mining space, Signavio, in a deal reportedly worth about $1.2 billion. As more of these companies get snapped up by would-be IBM rivals, and those left standing are working with a plethora of other parties, maybe it was high time for IBM to make sure it had its own horse in the race.

“Through IBM’s planned acquisition of myInvenio, we are revolutionizing the way companies manage their process operations,” said Massimiliano Delsante, CEO, myInvenio, who will be staying on with the deal. “myInvenio’s unique capability to automatically analyze processes and create simulations — what we call a ‘Digital Twin of an Organization’ —  is joining with IBM’s AI-powered automation capabilities to better manage process execution. Together we will offer a comprehensive solution for digital process transformation and automation to help enterprises continuously transform insights into action.”

Bigeye (formerly Toro) scores $17M Series A to automate data quality monitoring

As companies create machine learning models, the operations team needs to ensure the data used for the model is of sufficient quality, a process that can be time consuming. Bigeye (formerly Toro), an early stage startup is helping by automating data quality.

Today the company announced a $17 million Series A led Sequoia Capital with participation from existing investor Costanoa Ventures. That brings the total raised to $21 million with the $4 million seed, the startup raised last May.

When we spoke to Bigeye CEO and co-founder Kyle Kirwan last May, he said the seed round was going to be focussed on hiring a team — they are 11 now — and building more automation into the product, and he says they have achieved that goal.

“The product can now automatically tell users what data quality metrics they should collect from their data, so they can point us at a table in Snowflake or Amazon Redshift or whatever and we can analyze that table and recommend the metrics that they should collect from it to monitor the data quality — and we also automated the alerting,” Kirwan explained.

He says that the company is focusing on data operations issues when it comes to inputs to the model such as the table isn’t updating when it’s supposed to, it’s missing rows or there are duplicate entries. They can automate alerts to those kinds of issues and speed up the process of getting model data ready for training and production.

Bogomil Balkansky, the partner at Sequoia who is leading today’s investment sees the company attacking an important part of the machine learning pipeline. “Having spearheaded the data quality team at Uber, Kyle and Egor have a clear vision to provide always-on insight into the quality of data to all businesses,” Balkansky said in a statement.

As the founding team begins building the company, Kirwan says that building a diverse team is a key goal for them and something they are keenly aware of.

“It’s easy to hire a lot of other people that fit a certain mold, and we want to be really careful that we’re doing the extra work to [understand that just because] it’s easy to source people within our network, we need to push and make sure that we’re hiring a team that has different backgrounds and different viewpoints and different types of people on it because that’s how we’re going to build the strongest team,” he said.

Bigeye offers on prem and SaaS solutions, and while it’s working with paying customers like Instacart, Crux Informatics, and Lambda School, the product won’t be generally available until later in the year.

Tecton teams with founder of Feast open source machine learning feature store

Tecton, the company that pioneered the notion of the machine learning feature store, has teamed up with the founder of the open source feature store project called Feast. Today the company announced the release of version 0.10 of the open source tool.

The feature store is a concept that the Tecton founders came up with when they were engineers at Uber. Shortly thereafter an engineer named Willem Pienaar read the founder’s Uber blog posts on building a feature store and went to work building Feast as an open source version of the concept.

“The idea of Tecton [involved bringing] feature stores to the industry, so we build basically the best in class, enterprise feature store. […] Feast is something that Willem created, which I think was inspired by some of the early designs that we published at Uber. And he built Feast and it evolved as kind of like the standard for open source feature stores, and it’s now part of the Linux Foundation,” Tecton co-founder and CEO Mike Del Balso explained.

Tecton later hired Pienaar, who is today an engineer at the company where he leads their open source team. While the company did not originally start off with a plan to build an open source product, the two products are closely aligned, and it made sense to bring Pienaar on board.

“The products are very similar in a lot of ways. So I think there’s a similarity there that makes this somewhat symbiotic, and there is no explicit convergence necessary. The Tecton product is a superset of what Feast has. So it’s an enterprise version with a lot more advanced functionality, but at Feast we have a battle-tested feature store that’s open source,” Pienaar said.

As we wrote in a December 2020 story on the company’s $35 million Series B, it describes a feature store as “an end-to-end machine learning management system that includes the pipelines to transform the data into what are called feature values, then it stores and manages all of that feature data and finally it serves a consistent set of data.”

Del Balso says that from a business perspective, contributing to the open source feature store exposes his company to a different group of users, and the commercial and open source products can feed off one another as they build the two products.

“What we really like, and what we feel is very powerful here, is that we’re deeply in the Feast community and get to learn from all of the interesting use cases […] to improve the Tecton product. And similarly, we can use the feedback that we’re hearing from our enterprise customers to improve the open source project. That’s the kind of cross learning, and ideally that feedback loop involved there,” he said.

The plan is for Tecton to continue being a primary contributor with a team inside Tecton dedicated to working on Feast. Today, the company is releasing version 0.10 of the project.

Why XDR Vendors Must Build, Buy, and Partner

As 2021 starts with record breaches, security teams continue to evaluate whether they have the best products in their toolbelt, looking for products that work together to provide security that is greater than the sum of its parts. Customers will be better protected and more efficient when they buy from a vendor offering its own best-in-class products complemented with external integrations from other vendors.

In many cases, security teams are already upgrading to best-in-class tools, but owning the best often means having many vendors in a technology stack. Even for organizations buying for price, surveys show that the average number of security tools within an organization is still greater than 50. The solution is neither vendor consolidation nor vendor integration, but both. CISOs don’t want or expect to go from 50 tools to one. They want to go from 50 tools to a dozen well-integrated ones to gain efficiency and security efficacy.

How easily a security product integrates with other vendor’s products is an essential factor today but will become a deciding factor as XDR (extended detection and response) becomes a.standard. Increasingly sophisticated attacks exploit endpoint, identity, email, network, and the rest of the stack, so the full stack must work together to shut down the attack at every point.

Integrated Vendor Portfolios

Security vendors who approach XDR by building product portfolios will simplify security as long as they add products within their domain of expertise and integrate them by merging two sets of workflows into one.

Endpoint vendors, for example, can simplify a customer’s duties by expanding into more endpoint types. Natural extensions include Windows, macOS, Linux, iOS, Android, and ChromeOS. Merging those alerts and devices into shared workflows saves users time. Every new product a company builds or acquires should not mean another new administrative area for the customer.

Expanding from EDR into XDR is another example of building on a core competency. The best endpoint products are built on engines that correlate behavioral data points to block threats. Extending those engines to correlate behavioral data beyond the endpoint couldn’t be more natural, at least for the vendors who built their security on top of industry-leading engines that are only supplemented by managed security services.

Integrated Vendors

The pitfall that vendors have often found themselves in is that when the portfolio expands beyond the vendor’s area of expertise, its quality can suffer. No leading endpoint vendor is also a leading firewall vendor. No leading identity vendor is also a leading network security vendor. The vendors with the most extensive portfolios often have many sub-par products in their mix, and yet they are the vendors that offer the fewest integrations with other vendors.

Product portfolios that are walled in cause their customers more work and expose them to unnecessary threats at a time when the stakes of effective security have never been higher. Closed-off ecosystems may be effective by 2030, but integration is the most critical thing to do for 2025. Vendors should continue to look for areas within their circle of competence to expand, but they should also complement those offerings with a robust ecosystem of partnerships with other market-leading vendors.

Plug and Play Integrations

For a few well-funded security teams, a lack of integration between products can be worked around. Those well-funded teams can buy a SIEM and a SOAR, connect all those best-in-class tools, and dedicate a few people to writing SIEM detection rules and SOAR playbooks.

How, though, is the more common, classically understaffed security team supposed to manage all its tools? Detecting and responding across the stack, in real-time, without expensive playbook software, requires XDR.

Extended detection and response (XDR) is not just a marketing term, despite there being too much marketing describing it and not enough features delivering it. It is the technology being built to level the playing field.

With XDR, any team of any size can buy the best tools, regardless of vendor, and expect its products to work together to automatically detect at the campaign level and respond everywhere the attacker is.

XDR is the central, automated HUB that was always asked for by customers and just took technology years to begin delivering on.

Discover, Connect, Extend

Integration is critical, whether it be internal integration within a product portfolio or external integration with an ecosystem of partners.

SentinelOne started its XDR journey by expanding from securing Windows workstations to securing macOS, Linux, and Cloud. They all exist in a single console with shared workflows for policy, alerts, and our Deep Visibility event view.

The next evolution was to go beyond SentinelOne. We developed Threat Intelligence partnerships with industry leaders like RecordedFuture and ReversingLabs. By partnering with industry leaders we were able to bring the best Threat Intelligence into our best-in-class endpoint security.

When we set out to build these partnerships, we knew it was just the beginning. With our number of partners rapidly expanding, we envisioned a marketplace that would make it easy to find and enable integrations within just a few minutes and clicks.

Our recently externalized SentinelOne Marketplace is built to serve that vision of XDR. In the seconds it takes to input API keys for a selected vendor, the marketplace will make that handshake and begin working together with other best-in-breed technology.

Singularity Marketplace
Extend the power of the Singularity XDR platform with our ecosystem of bite-sized, 1-click applications for unified prevention, detection, and response.

Singularity Marketplace is part of our platform, so once an integration is set up, the effect becomes visible within the product. In the case of enabling a Threat Intelligence integration, threat intelligence is enriched into the product almost immediately. Marketplace makes discovery and integration as easy as online shopping.

Respond with Machine Speed

As Marketplace grows, our integrations will span more tool types, ingest more data from across those tools, and do more with it. Like everything we do, XDR will operate at machine speed. That’s where Scalyr comes in.

Our recent acquisition of Scalyr will enable SentinelOne to ingest more data, analyze for correlation, and prevent more threats. Regardless of whether the data source is from a new SentinelOne product or unstructured data from an integration with another vendor, Scalyr will connect the dots.

Further still, STAR (Storyline Active Response) sits on top of Scalyr. While much of STAR is still confidential, we can share that it is a highly customizable response tool developed to supply powerful XDR response capabilities on or beyond the endpoint.

Singularity XDR
Empower your SOC with end-to-end enterprise visibility, powerful analytics, and automated response across your complete technology stack.

Conclusion

Security is a rapidly evolving market with startups offering new innovations to help customers stay secure against an intensifying barrage of attacks. Ten years ago, the industry was largely composed of a handful of firewall and legacy AV vendors. A decade later, it’s unrecognizable.

This is an industry where invention will not stop because it cannot stop. As long as ransoms are paid, as long as rapid innovation creates new intellectual property to be stolen, as long as Bitcoin appreciates into more enriching loot to be stolen, and as long as compromising digital infrastructure offers money and power, attacks will continue to grow in their complexity.

To put it bluntly, the security market is being driven forward at breakneck pace by a virtual arms race. The blue team’s tech stack has to be everbetter to combat the undeniable innovation we’ve seen in red team efforts. XDR is emerging because there is a pressing need for connective tissue that will allow the new but disparate defensive tools to operate in a more hive-like manner. Over the next five years, vendors will be largely defined by those who deeply integrated their products to others and those who did not.

While thoughtful acquisitions will help, for every acquisition a new vendor with a new tool will emerge to take its place. The challenge of today is to build the connected defense network needed to connect tomorrow’s evolving tools.

To meet that challenge, SentinelOne is extending by launching new products within our domain expertise and partnering with other market leaders via Marketplace. To extend detection, we will ingest and correlate data from beyond the endpoint with Scalyr. Lastly, by leveraging STAR, we will respond wherever the attack is happening.

The future is an XDR-driven future. Specialized security products must work together to defend against an intensifying effort to overrun the digital barriers that protect our now technology-dependent lives. Security vendors preparing for this future should expand and strengthen their technology while also building an architecture to ingest from anywhere, correlate any data set, and respond wherever needed.


Like this article? Follow us on LinkedIn, Twitter, YouTube or Facebook to see the content we post.

Read more about Cyber Security

Upstack raises $50M for its platform and advisory to help businesses plan and buy for digital transformation

Digital transformation has been one of the biggest catchphrases of the past year, with many an organization forced to reckon with aging IT, a lack of digital strategy, or simply the challenges of growth after being faced with newly-remote workforces, customers doing everything online and other tech demands.

Now, a startup called Upstack that has built a platform to help those businesses evaluate how to grapple with those next steps — including planning and costing out different options and scenarios, and then ultimately buying solutions — is announcing financing to do some growth of its own.

The New York startup has picked up funding of $50 million, money that it will be using to continue building out its platform and expanding its services business.

The funding is coming from Berkshire Partners, and it’s being described as an “initial investment”. The firm, which makes private equity and late-stage growth investments, typically puts between $100 million and $1 billion in its portfolio companies so this could end up as a bigger number, especially when you consider the size of the market that Upstack is tackling: the cloud and internet infrastructure brokerage industry generates annual revenues “in excess of $70 billion,” the company estimates.

We’re asking about the valuation, but PitchBook notes that the median valuation in its deals is around $211 million. Upstack had previously raised around $35 million.

Upstack today already provides tools to large enterprises, government organizations, and smaller businesses to compare offerings and plan out pricing for different scenarios covering a range of IT areas, including private, public and hybrid cloud deployments; data center investments; network connectivity; business continuity and mobile services, and the plan is to bring in more categories to the mix, including unified communications and security.

Notably, Upstack itself is profitable and names a lot of customers that themselves are tech companies — they include Cisco, Accenture, cloud storage company Backblaze, Riverbed and Lumen — a mark of how digital transformation and planning for it are not necessarily a core competency even of digital businesses, but especially those that are not technology companies. It says it has helped complete over 3,700 IT projects across 1,000 engagements to date.

“Upstack was founded to bring enterprise-grade advisory services to businesses of all sizes,” said Christopher Trapp, founder and CEO, in a statement. “Berkshire’s expertise in the data center, connectivity and managed services sectors aligns well with our commitment to enabling and empowering a world-class ecosystem of technology solutions advisors with a platform that delivers higher value to their customers.”

The core of the Upstack’s proposition is a platform that system integrators, or advisors, plus end users themselves, can use to design and compare pricing for different services and solutions. This is an unsung but critical aspect of the ecosystem: We love to hear and write about all the interesting enterprise technology that is being developed, but the truth of the matter is that buying and using that tech is never just a simple click on a “buy” button.

Even for smaller organizations, buying tech can be a hugely time-consuming task. It involves evaluating different companies and what they have to offer — which can differ widely in the same category, and gets more complex when you start to compare different technological approaches to the same problem.

It also includes the task of designing solutions to fit one’s particular network. And finally, there are the calculations that need to be made to determine the real cost of services once implemented in an organization. It also gives users the ability to present their work, which also forms a critical part of the evaluating and decision-making process. When you think about all of this, it’s no wonder that so many organizations have opted to follow the “if it ain’t broke, don’t fix it” school of digital strategy.

As technology has evolved, the concept of digital transformation itself has become more complicated, making tools like Upstack’s more in demand both by companies and the people they hire to do this work for them. Upstack also employs a group of about 15 advisors — consultants — who also provide insight and guidance in the procurement process, and it seems some of the funding will also be used to invest in expanding that team.

(Incidentally, the model of balancing technology with human experts is one used by other enterprise startups that are built around the premise of helping businesses procure technology: BlueVoyant, a security startup that has built a platform to help businesses manage and use different security services, also retains advisors who are experts in that field.)

The advisors are part of the business model: Upstack’s customers can either pay Upstack a consulting fee to work with its advisors, or Upstack receives a commission from suppliers that a company ends up using, having evaluated and selected them via the Upstack platform.

The company competes with traditional systems integrators and consultants, but it seems that the fact that it has built a tech platform that some of its competitors also use is one reason why it’s caught the eye of investors, and also seen strong growth.

Indeed, when you consider the breadth of services that a company might use within their infrastructure — whether it’s software to run sales or marketing, or AI to run a recommendation for products on a site, or business intelligence or RPA — it will be interesting to see how and if Upstack considers deeper moves into these areas.

“Upstack has quickly become a leader in a large, rapidly growing and highly fragmented market,” said Josh Johnson, principal at Berkshire Partners, in a statement. “Our experience has reinforced the importance of the agent channel to enterprises designing and procuring digital infrastructure. Upstack’s platform accelerates this digital transformation by helping its advisors better serve their enterprise customers. We look forward to supporting Upstack’s continued growth through M&A and further investment in the platform.”

PlexTrac raises $10M Series A round for its collaboration-centric security platform

PlexTrac, a Boise, ID-based security service that aims to provide a unified workflow automation platform for red and blue teams, today announced that it has raised a $10 million Series A funding round led by Noro-Moseley Partners and Madrona Venture Group. StageDot0 ventures also participated in this round, which the company plans to use to build out its team and grow its platform.

With this new round, the company, which was founded in 2018, has now raised a total of $11 million, with StageDot0 leading its 2019 seed round.

PlexTrac CEO and President Dan DeCloss

PlexTrac CEO and President Dan DeCloss

“I have been on both sides of the fence, the specialist who comes in and does the assessment, produces that 300-page report and then comes back a year later to find that some of the critical issues had not been addressed at all.  And not because the organization didn’t want to but because it was lost in that report,” PlexTrac CEO and President Dan DeCloss said. “These are some of the most critical findings for an entity from a risk perspective. By making it collaborative, both red and blue teams are united on the same goal we all share, to protect the network and assets.”

With an extensive career in security that included time as a penetration tester for Veracode and the Mayo Clinic, as well as senior information security advisor for Anthem, among other roles, DeCloss has quite a bit of first-hand experience that led him to found PlexTrac. Specifically, he believes that it’s important to break down the wall between offense-focused red teams and defense-centric blue teams.

Image Credits: PlexTrac

 

 

“Historically there has been more of the cloak and dagger relationship but those walls are breaking down– and rightfully so, there isn’t that much of that mentality today– people recognize they are on the same mission whether they are internal security team or an external team,” he said. “With the PlexTrac platform the red and blue teams have a better view into the other teams’ tactics and techniques – and it makes the whole process into an educational exercise for everyone.”

At its core, PlexTrac makes it easier for security teams to produce their reports — and hence free them up to actually focus on ‘real’ security work. To do so, the service integrates with most of the popular scanners like Qualys, and Veracode, but also tools like ServiceNow and Jira in order to help teams coordinate their workflows. All the data flows into real-time reports that then help teams monitor their security posture. The service also features a dedicated tool, WriteupsDB, for managing reusable write-ups to help teams deliver consistent reports for a variety of audiences.

“Current tools for planning, executing, and reporting on security testing workflows are either nonexistent (manual reporting, spreadsheets, documents, etc…) or exist as largely incomplete features of legacy platforms,” Madrona’s S. Somasegar and Chris Picardo write in today’s announcement. “The pain point for security teams is real and PlexTrac is able to streamline their workflows, save time, and greatly improve output quality. These teams are on the leading edge of attempting to find and exploit vulnerabilities (red teams) and defend and/or eliminate threats (blue teams).”

 

Microsoft Patch Tuesday, April 2021 Edition

Microsoft today released updates to plug at least 110 security holes in its Windows operating systems and other products. The patches include four security fixes for Microsoft Exchange Server — the same systems that have been besieged by attacks on four separate (and zero-day) bugs in the email software over the past month. Redmond also patched a Windows flaw that is actively being exploited in the wild.

Nineteen of the vulnerabilities fixed this month earned Microsoft’s most-dire “Critical” label, meaning they could be used by malware or malcontents to seize remote control over vulnerable Windows systems without any help from users.

Microsoft released updates to fix four more flaws in Exchange Server versions 2013-2019 (CVE-2021-28480, CVE-2021-28481, CVE-2021-28482, CVE-2021-28483). Interestingly, all four were reported by the U.S. National Security Agency, although Microsoft says it also found two of the bugs internally. A Microsoft blog post published along with today’s patches urges Exchange Server users to make patching their systems a top priority.

Satnam Narang, staff research engineer at Tenable, said these vulnerabilities have been rated ‘Exploitation More Likely’ using Microsoft’s Exploitability Index.

“Two of the four vulnerabilities (CVE-2021-28480, CVE-2021-28481) are pre-authentication, meaning an attacker does not need to authenticate to the vulnerable Exchange server to exploit the flaw,” Narang said. “With the intense interest in Exchange Server since last month, it is crucial that organizations apply these Exchange Server patches immediately.”

Also patched today was a vulnerability in Windows (CVE-2021-28310) that’s being exploited in active attacks already. The flaw allows an attacker to elevate their privileges on a target system.

“This does mean that they will either need to log on to a system or trick a legitimate user into running the code on their behalf,” said Dustin Childs of Trend Micro. “Considering who is listed as discovering this bug, it is probably being used in malware. Bugs of this nature are typically combined with other bugs, such as browser bug of PDF exploit, to take over a system.”

In a technical writeup on what they’ve observed since finding and reporting attacks on CVE-2021-28310, researchers at Kaspersky Lab noted the exploit they saw was likely used together with other browser exploits to escape “sandbox” protections of the browser.

“Unfortunately, we weren’t able to capture a full chain, so we don’t know if the exploit is used with another browser zero-day, or coupled with known, patched vulnerabilities,” Kaspersky’s researchers wrote.

Allan Laska, senior security architect at Recorded Future, notes that there are several remote code execution vulnerabilities in Microsoft Office products released this month as well. CVE-2021-28454 and CVE-2021-28451 involve Excel, while CVE-2021-28453 is in Microsoft Word and CVE-2021-28449 is in Microsoft Office. All four vulnerabilities are labeled by Microsoft as “Important” (not quite as bad as “Critical”). These vulnerabilities impact all versions of their respective products, including Office 365.

Other Microsoft products that got security updates this month include Edge (Chromium-based), Azure and Azure DevOps Server, SharePoint Server, Hyper-V, Team Foundation Server, and Visual Studio.

Separately, Adobe has released security updates for Photoshop, Digital Editions, RoboHelp, and Bridge.

It’s a good idea for Windows users to get in the habit of updating at least once a month, but for regular users (read: not enterprises) it’s usually safe to wait a few days until after the patches are released, so that Microsoft has time to iron out any kinks in the new armor.

But before you update, please make sure you have backed up your system and/or important files. It’s not uncommon for a Windows update package to hose one’s system or prevent it from booting properly, and some updates have been known to erase or corrupt files.

So do yourself a favor and backup before installing any patches. Windows 10 even has some built-in tools to help you do that, either on a per-file/folder basis or by making a complete and bootable copy of your hard drive all at once.

And if you wish to ensure Windows has been set to pause updating so you can back up your files and/or system before the operating system decides to reboot and install patches on its own schedule, see this guide.

As always, if you experience glitches or problems installing any of these patches this month, please consider leaving a comment about it below; there’s a better-than-even chance other readers have experienced the same and may chime in here with some helpful tips.

Docugami’s new model for understanding documents cuts its teeth on NASA archives

You hear so much about data these days that you might forget that a huge amount of the world runs on documents: a veritable menagerie of heterogeneous files and formats holding enormous value yet incompatible with the new era of clean, structured databases. Docugami plans to change that with a system that intuitively understands any set of documents and intelligently indexes their contents — and NASA is already on board.

If Docugami’s product works as planned, anyone will be able to take piles of documents accumulated over the years and near-instantly convert them to the kind of data that’s actually useful to people.

If Docugami’s product works as planned, anyone will be able to take piles of documents accumulated over the years and near-instantly convert them to the kind of data that’s actually useful to people.

Because it turns out that running just about any business ends up producing a ton of documents. Contracts and briefs in legal work, leases and agreements in real estate, proposals and releases in marketing, medical charts, etc, etc. Not to mention the various formats: Word docs, PDFs, scans of paper printouts of PDFs exported from Word docs, and so on.

Over the last decade there’s been an effort to corral this problem, but movement has largely been on the organizational side: put all your documents in one place, share and edit them collaboratively. Understanding the document itself has pretty much been left to the people who handle them, and for good reason — understanding documents is hard!

Think of a rental contract. We humans understand when the renter is named as Jill Jackson, that later on, “the renter” also refers to that person. Furthermore, in any of a hundred other contracts, we understand that the renters in those documents are the same type of person or concept in the context of the document, but not the same actual person. These are surprisingly difficult concepts for machine learning and natural language understanding systems to grasp and apply. Yet if they could be mastered, an enormous amount of useful information could be extracted from the millions of documents squirreled away around the world.

What’s up, .docx?

Docugami founder Jean Paoli says they’ve cracked the problem wide open, and while it’s a major claim, he’s one of few people who could credibly make it. Paoli was a major figure at Microsoft for decades, and among other things helped create the XML format — you know all those files that end in x, like .docx and .xlsx? Paoli is at least partly to thank for them.

“Data and documents aren’t the same thing,” he told me. “There’s a thing you understand, called documents, and there’s something that computers understand, called data. Why are they not the same thing? So my first job [at Microsoft] was to create a format that can represent documents as data. I created XML with friends in the industry, and Bill accepted it.” (Yes, that Bill.)

The formats became ubiquitous, yet 20 years later the same problem persists, having grown in scale with the digitization of industry after industry. But for Paoli the solution is the same. At the core of XML was the idea that a document should be structured almost like a webpage: boxes within boxes, each clearly defined by metadata — a hierarchical model more easily understood by computers.

Illustration showing a document corresponding to pieces of another document.

Image Credits: Docugami

“A few years ago I drank the AI kool-aid, got the idea to transform documents into data. I needed an algorithm that navigates the hierarchical model, and they told me that the algorithm you want does not exist,” he explained. “The XML model, where every piece is inside another, and each has a different name to represent the data it contains — that has not been married to the AI model we have today. That’s just a fact. I hoped the AI people would go and jump on it, but it didn’t happen.” (“I was busy doing something else,” he added, to excuse himself.)

The lack of compatibility with this new model of computing shouldn’t come as a surprise — every emerging technology carries with it certain assumptions and limitations, and AI has focused on a few other, equally crucial areas like speech understanding and computer vision. The approach taken there doesn’t match the needs of systematically understanding a document.

“Many people think that documents are like cats. You train the AI to look for their eyes, for their tails … documents are not like cats,” he said.

It sounds obvious, but it’s a real limitation. Advanced AI methods like segmentation, scene understanding, multimodal context, and such are all a sort of hyperadvanced cat detection that has moved beyond cats to detect dogs, car types, facial expressions, locations, etc. Documents are too different from one another, or in other ways too similar, for these approaches to do much more than roughly categorize them.

As for language understanding, it’s good in some ways but not in the ways Paoli needed. “They’re working sort of at the English language level,” he said. “They look at the text but they disconnect it from the document where they found it. I love NLP people, half my team is NLP people — but NLP people don’t think about business processes. You need to mix them with XML people, people who understand computer vision, then you start looking at the document at a different level.”

Docugami in action

Illustration showing a person interacting with a digital document.

Image Credits: Docugami

Paoli’s goal couldn’t be reached by adapting existing tools (beyond mature primitives like optical character recognition), so he assembled his own private AI lab, where a multidisciplinary team has been tinkering away for about two years.

“We did core science, self-funded, in stealth mode, and we sent a bunch of patents to the patent office,” he said. “Then we went to see the VCs, and SignalFire basically volunteered to lead the seed round at $10 million.”

Coverage of the round didn’t really get into the actual experience of using Docugami, but Paoli walked me through the platform with some live documents. I wasn’t given access myself and the company wouldn’t provide screenshots or video, saying it is still working on the integrations and UI, so you’ll have to use your imagination … but if you picture pretty much any enterprise SaaS service, you’re 90% of the way there.

As the user, you upload any number of documents to Docugami, from a couple dozen to hundreds or thousands. These enter a machine understanding workflow that parses the documents, whether they’re scanned PDFs, Word files, or something else, into an XML-esque hierarchical organization unique to the contents.

“Say you’ve got 500 documents, we try to categorize it in document sets, these 30 look the same, those 20 look the same, those five together. We group them with a mix of hints coming from how the document looked, what it’s talking about, what we think people are using it for, etc.,” said Paoli. Other services might be able to tell the difference between a lease and an NDA, but documents are too diverse to slot into pre-trained ideas of categories and expect it to work out. Every set of documents is potentially unique, and so Docugami trains itself anew every time, even for a set of one. “Once we group them, we understand the overall structure and hierarchy of that particular set of documents, because that’s how documents become useful: together.”

Illustration showing a document being turned into a report and a spreadsheet.

Image Credits: Docugami

That doesn’t just mean it picks up on header text and creates an index, or lets you search for words. The data that is in the document, for example who is paying whom, how much and when, and under what conditions, all that becomes structured and editable within the context of similar documents. (It asks for a little input to double check what it has deduced.)

It can be a little hard to picture, but now just imagine that you want to put together a report on your company’s active loans. All you need to do is highlight the information that’s important to you in an example document — literally, you just click “Jane Roe” and “$20,000” and “five years” anywhere they occur — and then select the other documents you want to pull corresponding information from. A few seconds later you have an ordered spreadsheet with names, amounts, dates, anything you wanted out of that set of documents.

All this data is meant to be portable too, of course — there are integrations planned with various other common pipes and services in business, allowing for automatic reports, alerts if certain conditions are reached, automated creation of templates and standard documents (no more keeping an old one around with underscores where the principals go).

Remember, this is all half an hour after you uploaded them in the first place, no labeling or pre-processing or cleaning required. And the AI isn’t working from some preconceived notion or format of what a lease document looks like. It’s learned all it needs to know from the actual docs you uploaded — how they’re structured, where things like names and dates figure relative to one another, and so on. And it works across verticals and uses an interface anyone can figure out in a few minutes. Whether you’re in healthcare data entry or construction contract management, the tool should make sense.

The web interface where you ingest and create new documents is one of the main tools, while the other lives inside Word. There Docugami acts as a sort of assistant that’s fully aware of every other document of whatever type you’re in, so you can create new ones, fill in standard information, comply with regulations and so on.

Okay, so processing legal documents isn’t exactly the most exciting application of machine learning in the world. But I wouldn’t be writing this (at all, let alone at this length) if I didn’t think this was a big deal. This sort of deep understanding of document types can be found here and there among established industries with standard document types (such as police or medical reports), but have fun waiting until someone trains a bespoke model for your kayak rental service. But small businesses have just as much value locked up in documents as large enterprises — and they can’t afford to hire a team of data scientists. And even the big organizations can’t do it all manually.

NASA’s treasure trove

Image Credits: NASA

The problem is extremely difficult, yet to humans seems almost trivial. You or I could glance through 20 similar documents and a list of names and amounts easily, perhaps even in less time than it takes for Docugami to crawl them and train itself.

But AI, after all, is meant to imitate and transcend human capacity, and it’s one thing for an account manager to do monthly reports on 20 contracts — quite another to do a daily report on a thousand. Yet Docugami accomplishes the latter and former equally easily — which is where it fits into both the enterprise system, where scaling this kind of operation is crucial, and to NASA, which is buried under a backlog of documentation from which it hopes to glean clean data and insights.

If there’s one thing NASA’s got a lot of, it’s documents. Its reasonably well-maintained archives go back to its founding, and many important ones are available by various means — I’ve spent many a pleasant hour perusing its cache of historical documents.

But NASA isn’t looking for new insights into Apollo 11. Through its many past and present programs, solicitations, grant programs, budgets, and of course engineering projects, it generates a huge amount of documents — being, after all, very much a part of the federal bureaucracy. And as with any large organization with its paperwork spread over decades, NASA’s document stash represents untapped potential.

Expert opinions, research precursors, engineering solutions, and a dozen more categories of important information are sitting in files searchable perhaps by basic word matching but otherwise unstructured. Wouldn’t it be nice for someone at JPL to get it in their head to look at the evolution of nozzle design, and within a few minutes have a complete and current list of documents on that topic, organized by type, date, author and status? What about the patent advisor who needs to provide a NIAC grant recipient information on prior art — shouldn’t they be able to pull those old patents and applications up with more specificity than any with a given keyword?

The NASA SBIR grant, awarded last summer, isn’t for any specific work, like collecting all the documents of such and such a type from Johnson Space Center or something. It’s an exploratory or investigative agreement, as many of these grants are, and Docugami is working with NASA scientists on the best ways to apply the technology to their archives. (One of the best applications may be to the SBIR and other small business funding programs themselves.)

Another SBIR grant with the NSF differs in that, while at NASA the team is looking into better organizing tons of disparate types of documents with some overlapping information, at NSF they’re aiming to better identify “small data.” “We are looking at the tiny things, the tiny details,” said Paoli. “For instance, if you have a name, is it the lender or the borrower? The doctor or the patient name? When you read a patient record, penicillin is mentioned, is it prescribed or prohibited? If there’s a section called allergies and another called prescriptions, we can make that connection.”

“Maybe it’s because I’m French”

When I pointed out the rather small budgets involved with SBIR grants and how his company couldn’t possibly survive on these, he laughed.

“Oh, we’re not running on grants! This isn’t our business. For me, this is a way to work with scientists, with the best labs in the world,” he said, while noting many more grant projects were in the offing. “Science for me is a fuel. The business model is very simple — a service that you subscribe to, like Docusign or Dropbox.”

The company is only just now beginning its real business operations, having made a few connections with integration partners and testers. But over the next year it will expand its private beta and eventually open it up — though there’s no timeline on that just yet.

“We’re very young. A year ago we were like five, six people, now we went and got this $10 million seed round and boom,” said Paoli. But he’s certain that this is a business that will be not just lucrative but will represent an important change in how companies work.

“People love documents. Maybe it’s because I’m French,” he said, “but I think text and books and writing are critical — that’s just how humans work. We really think people can help machines think better, and machines can help people think better.”

Zoho launches new low code workflow automation product

Workflow automation has been one of the key trends this year so far, and Zoho, a company known for its suite of affordable business tools has joined the parade with a new low code workflow product called Qntrl (pronounced control).

Zoho’s Rodrigo Vaca, who is in charge of Qntrl’s marketing says that most of the solutions we’ve been seeing are built for larger enterprise customers. Zoho is aiming for the mid-market with a product that requires less technical expertise than traditional business process management tools.

“We enable customers to design their workflows visually without the need for any particular kind of prior knowledge of business process management notation or any kind of that esoteric modeling or discipline,” Vaca told me.

While Vaca says, Qntrl could require some technical help to connect a workflow to more complex backend systems like CRM or ERP, it allows a less technical end user to drag and drop the components and then get help to finish the rest.

“We certainly expect that when you need to connect to NetSuite or SAP you’re going to need a developer. If nothing else, the IT guys are going to ask questions, and they will need to provide access,” Vaca said.

He believes this product is putting this kind of tooling in reach of companies that may have been left out of workflow automation for the most part, or which have been using spreadsheets or other tools to create crude workflows. With Qntrl, you drag and drop components, and then select each component and configure what happens before, during and after each step.

What’s more, Qntrl provides a central place for processing and understanding what’s happening within each workflow at any given time, and who is responsible for completing it.

We’ve seen bigger companies like Microsoft, SAP, ServiceNow and others offering this type of functionality over the last year as low code workflow automation has taken center stage in business.

This has become a more pronounced need during the pandemic when so many workers could not be in the office. It made moving work in a more automated workflow more imperative, and we have seen companies moving to add more of this kind of functionality as a result.

Brent Leary, principal analyst at CRM Essentials, says that Zoho is attempting to remove some the complexity from this kind of tool.

“It handles the security pieces to make sure the right people have access to the data and processes used in the workflows in the background, so regular users can drag and drop to build their flows and processes without having to worry about that stuff,” Leary told me.

Qntrl is available starting today starting at just $7 per user month.

Meroxa raises $15M Series A for its real-time data platform

Meroxa, a startup that makes it easier for businesses to build the data pipelines to power both their analytics and operational workflows, today announced that it has raised a $15 million Series A funding round led by Drive Capital. Existing investors Root, Amplify and Hustle Fund also participated in this round, which together with the company’s previously undisclosed $4.2 million seed round now brings total funding in the company to $19.2 million.

The promise of Meroxa is that can use a single platform for their various data needs and won’t need a team of experts to build their infrastructure and then manage it. At its core, Meroxa provides a single Software-as-a-Service solution that connects relational databases to data warehouses and then helps businesses operationalize that data.

Image Credits: Meroxa

“The interesting thing is that we are focusing squarely on relational and NoSQL databases into data warehouse,” Meroxa co-founder and CEO DeVaris Brown told me. “Honestly, people come to us as a real-time FiveTran or real-time data warehouse sink. Because, you know, the industry has moved to this [extract, load, transform] format. But the beautiful part about us is, because we do change data capture, we get that granular data as it happens.” And businesses want this very granular data to be reflected inside of their data warehouses, Brown noted, but he also stressed that Meroxa can expose this stream of data as an API endpoint or point it to a Webhook.

The company is able to do this because its core architecture is somewhat different from other data pipeline and integration services that, at first glance, seem to offer a similar solution. Because of this, users can use the service to connect different tools to their data warehouse but also build real-time tools on top of these data streams.

Image Credits: Meroxa

“We aren’t a point-to-point solution,” Meroxa co-founder and CTO Ali Hamidi explained. “When you set up the connection, you aren’t taking data from Postgres and only putting it into Snowflake. What’s really happening is that it’s going into our intermediate stream. Once it’s in that stream, you can then start hanging off connectors and say, ‘Okay, well, I also want to peek into the stream, I want to transfer my data, I want to filter out some things, I want to put it into S3.”

Because of this, users can use the service to connect different tools to their data warehouse but also build real-time tools to utilize the real-time data stream. With this flexibility, Hamidi noted, a lot of the company’s customers start with a pretty standard use case and then quickly expand into other areas as well.

Brown and Hamidi met during their time at Heroku, where Brown was a director of product management and Hamidi a lead software engineer. But while Heroku made it very easy for developers to publish their web apps, there wasn’t anything comparable in the highly fragmented database space. The team acknowledges that there are a lot of tools that aim to solve these data problems, but few of them focus on the user experience.

Image Credits: Meroxa

“When we talk to customers now, it’s still very much an unsolved problem,” Hamidi said. “It seems kind of insane to me that this is such a common thing and there is no ‘oh, of course you use this tool because it addresses all my problems.’ And so the angle that we’re taking is that we see user experience not as a nice-to-have, it’s really an enabler, it is something that enables a software engineer or someone who isn’t a data engineer with 10 years of experience in wrangling Kafka and Postgres and all these things. […] That’s a transformative kind of change.”

It’s worth noting that Meroxa uses a lot of open-source tools but the company has also committed to open-sourcing everything in its data plane as well. “This has multiple wins for us, but one of the biggest incentives is in terms of the customer, we’re really committed to having our agenda aligned. Because if we don’t do well, we don’t serve the customer. If we do a crappy job, they can just keep all of those components and run it themselves,” Hamidi explained.

Today, Meroxa, which the team founded in early 2020, has over 24 employees (and is 100% remote). “I really think we’re building one of the most talented and most inclusive teams possible,” Brown told me. “Inclusion and diversity are very, very high on our radar. Our team is 50% black and brown. Over 40% are women. Our management team is 90% underrepresented. So not only are we building a great product, we’re building a great company, we’re building a great business.”