Database Performance & Optimization

Off late I was thinking about some of the services that we can render from eXzaTech on the data & information management solutions and I suddenly remembered two interesting situations I handled sometime (of course quite long ago) back. I thought I should share them with you. Hence this article.

In the first, it was more to do with complaints from a software development company on the performance of Microsoft SQL Server 2005 Enterprise Edition on a server with Intel Itanium Processor. Software developers were complaining that SQL Server is not performing well on Itanium Processor and they were benchmarking the performance against Intel Xeon processor based server without emphasizing the fact that they were not comparing apple to apple. In their labs they were using Intel Xeon based servers with low volume test data whereas the production server was handling the very high volume, live production data of an international shipping company. Although Itanium Processor lost out to Xeon for various reasons over a period of time, I never believed that Itanium processors were such poor performers as against Xeon 64 Bit processors. This was a typical case of Software vendor blaming hardware and hardware vendor blaming software for poor performance.

The architecture was very complex. Those details are not relevant here.

Concerns were mainly related to very high CPU utilization, that sometimes used to touch 100% and server used to hang. There were unusual number of deadlocks, lock timeouts lock waits resulting transaction losses, with alarmingly high amount of resource utilization.

The customer was using a performance monitoring tool – An NMS / EMS - and those readings were being debated. There were also questions about the server sizing and the reasons behind choosing Intel Itanium 2 Processor.

When we were asked to take a look & share our views. We first took a look at the server sizing. We used the hardware vendor's sizer tool and arrived at the suggested server size for the given workload. The results were astonishing. The server being used was sufficient to handle the work loads of next 5 years at their estimations!! The same was being seconded by the hardware, Operating System and Database vendor.

This basic belief made us to take a look at the SQL Server logs, analyze the performance counters, actual transaction SQL Statements and batch jobs, types of workload and what we find was really surprising. Single SQL Server 2005 instance was handling the entire business workload, business logic, MIS reporting, Business Intelligence & Analytics simultaneously. Missing indexes were forcing the query execution plan to get recompiled almost every time, consuming the CPU & memory resources. This coupled with server side cursors that were holding the CPU & memory resources resulting in SQL Server Query Optimizer running out of internal resources to compile the query execution plan and errors were thrown. BI and Analytics workload with complex queries consisting of hundreds of operations such as nested joins, hashed joins, sorts, condition processing, ftp, URL, text processing etc., were executing during the peak business hours. Extremely high I/O operations due to server side cursors holding large data sets, working against the tables, coupled with missing indexes resulting in repeated expensive scans & were creating latch access violations. These resulted in deadlocks causing severe pressure on CPU & Memory resources which sometimes resulted in a hanged server.

Over and above, in order to reduce the deadlock situation, lowest isolation levels – read uncommitted – were used for many transaction queries. This coupled with repeated change in table structure had created missing & inconsistent data. These had created a severe pressure on the very business and business had to resort to their good old process to continue with their business.

Our role was only to give our views and recommendations and we did just that. Our recommendations were:
  1. to move the analysis & reporting workload out of the live production instance in to a different database and conduct ETL activities during off peak hours and
  2. optimize the queries, create indexes
  3. avoid changing the database structure often d) use different integration methodologies to integrate other applications

Once the recommendations were implemented by the customer, it was observed that the database was performing normally without much pressure on CPU & memory resources and they were not losing any transaction / data.

The second instance was a situation that the customer wanted to keep IBM DB2 Express Edition for sometime till migrating totally to Oracle ERP. The situation here was when nearly 700 users connect to the DB, server used to hang, and they were forced to restart the server.

When we were called into help them to optimize the performance and prevent server from getting restarted as the load increases. The easiest suggestion we could have given is to upgrade their license to either to DB2 Workgroup or DB2 Enterprise Edition. The customer was not interested in upgrading his license as suggested as DB2 was being used only till the migration to new ERP.

During our engagement with the customer we observed that 32 Bit edition of DB2 Express Version 8.1 was used. The server was a Virtual Machine on a 64 Bit hardware with 32 Bit Windows 2003 Standard Server. Considering the limitation of 32 Bit operating system and the CPU & memory limitations of DB2 Express Edition we concluded that it was the memory pressure that is causing CPU pressure as well resulting in hanged server and server had to be restarted. This was causing chaos across the organization as applications with DB2 were handling critical business operations.

Simple suggestions like removing unwanted data from Disk that cleared almost 40% of disk and this improved the performance a bit. Upgrading to Windows Server 2008 R2 64 Bit edition OS, DB2 Express Edition Version 8.1 to DB2 Express Edition Version 9.7 64 Bit did the trick. Without increasing the CPU & memory allocations DB2 started handling nearly 7000 concurrent connections. There was not even a single instance of hanged server and transaction loss. This was tested over a month's period before being declared live.

These two instances have taught us that there is no single formula to correct the poor performance. The problems can be anywhere – the hardware, the Operating System, The application design – literally across the infrastructure that is delivering the applications.

What do you say?


Porular Posts

SMARTDesktop - Microsoft Licensing & Total Cost of Ownership – Part 2

Samba4 Active Directory in action in a Commercial Bank in India

SMARTDesktop - Microsoft Licensing & Total Cost of Ownership – Part 1