Buggy behavior bites .NET SqlClient, but only for those not using Windows

.NET devs have been struggling to deal with errors affecting non-Windows SqlClients under heavy load

Back in February, .NET software developers using Microsoft.Data.SqlClient, an open source data access driver for Microsoft SQL Server, noticed that certain queries were slow or timed out on Linux under specific circumstances.…

Back in February, .NET software developers using Microsoft.Data.SqlClient, an open source data access driver for Microsoft SQL Server, noticed that certain queries were slow or timed out on Linux under specific circumstances.

The issue (#442), reported on GitHub, has lingered unfixed for almost a year now.

In July, Nerijus Arlauskas, a developer based in Lithuania, found that non-Windows clients (macOS, Linux, WSL, Docker) sometimes returned invalid results for queries, a separate but perhaps related issue (#659).

It’s a potentially serious problem when a database provides inaccurate information. “Under no circumstances a SELECT statement should return a different result,” Arlauskas wrote in his report. “This can cause application crashes, personal data leaks, users purchasing products on behalf of other users, and security breaches.”

Or as was said in the 1984 film Ghostbusters, “Human sacrifice, dogs and cats living together, mass hysteria!”

Fortunately, these errors occur only rarely – on systems running 2,000 or more concurrent connections, among other qualifying conditions – but that makes the root cause harder to diagnose and repair.

About a month ago, Cheena Malhotra, lead developer at US-based Magnitude Software, submitted a pull request that addresses various other bugs arising from asynchronous operations interfering with one another. The changes have been merged into the SqlClient codebase but they haven’t resolved issues #442 and #659.

Issue #442 has been causing problems since well before it was reported in February. In his writeup earlier this year, Pawel Pabich, engineering manager at Octopus Cloud in Brisbane, Australia, said, “We’ve been battling this issue for a long time now so we are happy to help in any way we can to get it resolved.”

Pabich explained that Octopus Cloud hosts Octopus Deploy instances in Linux containers on Azure AKS with data stored in Azure Files and Azure SQL. Several months prior to his February post, he said, the company noticed some SQL queries were slow or timing out, which is not anything the company had seen before on Windows under the .NET framework. He suggested the SqlClient might have something to do with this.

This bug is vexing enough that developers like Samm Desmond, co-founder of blockchain biz Nodesmith and a software engineer at Shelf Engine, wrote in a comment on Wednesday, “We’ve been having some major problems and have started rewriting anything that touches the database to use async as [Cheena Malhotra, lead developer at US-based Magnitude Software,] recommended, but it’s a massive refactor for us. …We’re also considering moving back to Windows because it sounds like that would resolve the issue here?”

A solution for Issue #659 has proven to be similarly illusive. There appears to be some hope that the alterations submitted by Malhotra may work, but those changes appear to require additional review and testing before they get deployed.

To help identify the situations where things go sideways, Alessio Franceschelli, senior principal engineer at Trainline in the UK, has created a containerized simulator for the SqlClient using docker-compose.

Grab a seat and some popcorn. It may be a while. ®

Rojenx is a leading concept artist who work appears in games and publications

Check out his personal gallery here

This site uses Akismet to reduce spam. Learn how your comment data is processed.