Project T5220, try 1: How NOT to configure a T5220 as a complete Oracle/Weblogic development environment

Currently I have to configure a Sun Enterprise T5220 as our new "developemnt environment", replacing our V440 (Oracle) and the T1000 (Weblogic). We chose a Sun CMT machine as we wanted to stay close to production in regards of architecture (processor type/OS etc).

The machine will have to run an Oracle instance, and some containers for the Weblogic environments (we have an integration env, daily build env + some development environments for projects). The new T5220 ist packed with a 1.4GHz T2 Niagara, 64GB of memory, 2x146GB for the OSes, and 6x300GB for the database and NFS.

Given that, the ideal setup sounds like:

  • 2 Guest LDOMs, where:
  • 1 Oracle-LDOM
  • 1 Weblogic LDOM, using Solaris zones to seperate the environments

  • The setup is 100% ZFS
  • The control domain runs on one slice of the 146GB drives in ZFS raid1
  • The guest domain roots run on the other slice of that disk, seperate zpool, exported as ZVOL
  • The 6 300GB disks are exported as raw disk slices (EFI labeled), and formed into a ZFS raidz inside the Oracle LDOM
  • Inside that raidz zpool, there is a ZFS with recordsize=8k for the Oracle datafiles, and a ZFS with 128k blocksize for the redo logs

Regarding the database, this is pretty similar to our current setup, a SunFire V440 with a ST2540 FC-Storage attached, running ZFS on top of the hardware RAID5. Similar in regards of the filesystem, at least. ZFS runs very well on the V440.

Okay, now, after setting this up, it turned out that the DB performance is unacceptable. Absolutely horrible to be honest. We're doing large imports of our production database on a regular basis, and on the V440 it takes about 50 minutes, now we're up to 150 minutes on the T5220.

Here are some numbers. Simple dd testing: creating a 15GB file.

V440 with St2540 RAID5 volume, exported via 2x2Gbit FC, configured as a simple ZFS

$ time dd if=/dev/zero of=ddtest.file bs=8k count=2000000
2000000+0 records in
2000000+0 records out

real    2m32.380s
user    0m3.021s
sys     1m27.533s
$ echo "((16384000000/152)/1024)/1024"|bc -l
102.79605263157894736842

T5220 with 10k/rpm local SAS, exported as raw disk slice into guest LDOM and configured as ZFS RAIDz:

$ echo "((16384000000/336)/1024)/1024"|bc -l
46.50297619047619047619

Now things get strange. Inside the control domain, onto the ldompool:

$ echo "((1638400000/35)/1024)/1024"|bc -l
44.64285714285714285714

WTF?

That is worse than I get with a Linux guest in VirtualBox onto a virtual drive. Not acceptable for 300€ 10k rpm/s SAS drives.

Me asking for help and further infos at Sun forums

Anyways, I'm going to attach the documented step-by-step guide on how to setup all this from scratch (M$-Word .doc, in German, sorry)

AttachmentSize
Microsoft Office document icon EWU-Installation-Guide.doc0 bytes

Comments

Meanwhile I reconfigured the control domain as the global zone (i.e. I purged all LDOM settings and the LDOMs 1.3 software), and setup zones. As soon I had rebooted in factory-default, I got 100MB/s in the above mentioned ldompool, and no stalls at all. Also the throughput on the RAIDZ is fine.
Anyways while creating the Oracle datafiles, I saw that writing those to disk was causing a load of 7.5 in the default cpu pool (oracle is running in a dedicated pool of course.
The control domain had only 4 vcpus assigned, and if I'm right, a load of 7 means ZFS was using 7 threads (i.e. 7 vcpus) to do its work. I did not see that in prstat in the control domain, but poolstat revealed it for the pool_default (still prstat didn't show any load or cpu usage, weird). Maybe that explains something. But I'm pleased with the zone setup now. Works just as I expect things to work.

Hi Thomas,
I saw your post on the Oracle/Solaris forum and then found your site. I'm currently discussing internally in my company proposing we start to deploy smaller production systems on single node Oracle DB setups - normally we use 2 node Oracle Enterprise RAC setups with a shared SAN. My current architecture is using a T5220 with 4 x 300Gb 10K disks and hardware RAID - in your experience for your development environment setup you describe, did you find that the external SAN performance was dramatically better than using the internal disks on the T5220 setup you moved to?
I'm trying to inject a bit of sense into the internal conversation.
Many thanks for any input you can provide from your experience.
Davy

Hi Davy,

I think it depends on the quality of your SAN, and the way you attach it. The 125k€-SAN we run our ESX4 farm on does perform "a bit" better. So it also depends on your budget.

What I described in above post and the comment was the incredibly huge difference of LDOMS vs. Zones on that internal array. When using zones, there was no big difference compared to the external ST2540 FC storage. What _really_ hurt Oracle performance was the switch from 1.5Ghz UltraSPARC IIIi to that 1.4GHz CMT processor. We had to parallelize schema gathering and index creation for imports because of the T2's horrible single thread performance. But other than that things are performing absolutely okay for our development environment. In the end it also depends on your workload.

Cheers,

Thomas