AMD Logo AMD Developer Central

“Istanbul” Zone
Skip Navigation LinksHome > Tech Zones > “Istanbul” Zone

Istanbul Banner

The new AMD “Istanbul” processors build on the foundation laid by the AMD “Barcelona” and “Shanghai” Family 10h processors with some key technology advancements.  With “Barcelona,” we introduced an array of innovations in processor design and features, including native quad-core architecture and a new L3 cache shared across the processor cores. The AMD “Shanghai” release brought additional enhancements including improved scalability, availability and increased the L3 cache.  Now with the release of the AMD “Istanbul” processor there are even more enhancements for software developers such as an even larger shared L3 cache, a total of six physical cores on die, a new probing filter called HT Assist to help increase bandwidth and several new power features to keep the system running cool.

There are a number of software visible features that can be leveraged to make your applications perform better and be ready to scale across multiple cores. Visit this page regularly for updated information and practical guidance on how to take advantage of all the new features in the “Barcelona”, “Shanghai”, and “Istanbul” Family 10h processors.

» Software Development Tools and Resources
» Overview of Software Visible Features
» Documentation
» Technical Articles & Blogs
» Benchmarks and Performance Evaluations
» Related Resources

 
Software Development Tools and Resources
The following software development tools and resources have been optimized for AMD “Barcelona”, “Shanghai” and “Istanbul” Family 10h processors:

AMD Core Math Library (ACML)
ACML is specifically designed to support multi-threading and other key features of AMD’s next-generation processors. ACML currently supports OpenMP, and features hand-tuned “Barcelona”, “Shanghai” and “Istanbul” support  support for SGEMM and DGEMM matrix multiplication routines, and the CFFT complex-complex Fast Fourier Transforms. The newly released ACML 4.2.0 includes further tuning of DGEMM and improved performance on 3D FFTs. The newly released ACML 4.2.0 includes further tuning of DGEMM and improved performance on 3D FFTs.  

AMD CodeAnalyst Performance Analyzer
“Shanghai” built upon the Instruction-Based Sampling (IBS) functionality that was originally introduced in “Barcelona.”  “Shanghai” added a new mode of operation for Instruction-Based Sampling. This mode enhances IBS op sampling. In addition to using processor cycles to select ops for monitoring and sampling, the new mode counts ops as they are dispatched and uses the count to decide when an op should be selected for monitoring and sampling. The new mode greatly improves the statistical distribution of profile data and will help software developers to interpret and apply IBS data and is supported in the “Istanbul” processor.

AMD CodeAnalyst Performance Analyzer also supports the small number of new performance events and event unit masks on “Shanghai” and “Istanbul” processors.

Framewave
The Framewave open source library is optimized to yield maximum performance on x86 and AMD64 hardware architectures. Current implementations exploit multicore architecture and single instruction multiple data (SIMD) instructions. Specifically, streaming SIMD extensions and AMD Family 10h technologies are used to optimize for speed. Please download the latest Framewave version from SourceForge to experience the best performance.

GNU Toolset
The GNU Toolset, including the GCC compiler, the glibc project, and the binutils, have been optimized for AMD Family 10h processors, including “Shanghai” and “Barcelona.”

Microsoft Visual Studio® compilers
The Visual Studio 2008 tools feature improved instruction selection, optimized register allocation, and enhanced 128-bit floating-point performance when used with AMD Third-Generation Opteron processors.

x86 Open 64 Compiler Suite
The x86 Open64 compiler system is a high performance, production quality code generation tool designed for high performance parallel computing workloads. The x86 Open64 environment provides the developer the essential choices when building and optimizing C, C++, and Fortran applications targeting 32-bit and 64-bit Linux platforms.

PGI compilers
PGI compilers and tools enable maximum overall performance on multi-core AMD64 processors through auto-parallelization and OpenMP directive-based parallel programming.  New options and optimizations improve Peak SPECCPU 2006 performance between 5-6% over the previous release 7.1 running on quad-core AMD Opteron processors.

Sun Studio compilers
The latest version of the Sun Studio compilers contain performance improvements to better support AMD’s “Barcelona” and “Shanghai” processors, including compiler optimization flags for best performing code.

More Optimized Partner Tools

Absoft
The Absoft Pro Fortran tool suite provides maximum application speed on multi-core AMD64 processors, excellent legacy code support, F2003 extensions, superior debugging and a complete integrated Fortran/C++ development environment. Includes optimizers for Barcelona and Shanghai. Math libraries and graphics also included.

Allinea
Allinea's tools for multi-core and high performance computing (HPC) set new standards for affordability and ease-of-use in parallel and multi-core programming.  New product features aimed at multi-threaded applications and novel computing architectures take advantage of the AMD "Shanghai" processor's powerful new features.

» See all

 
Overview of Software Visible Features
Feature flags for new functions:
  • Fire & forget dynamic O/S P-state support
  • Misaligned SSE access
  • OS Visible workaround register
  • Instruction-based sampling
  • SVM lock
  • Nested Paging
  • L3 cache size
  • 128-bit FPU

Feature identification bits for new instructions

  • MONITOR/MWAIT
  • LZCNT
  • POPCNT
  • SSE4a Instructions
 
Documentation
Technical Articles & Blogs
Five years ago, AMD shook up the x86 processor by putting a memory controller directly on-chip. Now, AMD breaks new ground again with an innovative cache strategy. Inside AMD's three-tier cache design.
» Barcelona's Innovative Architecture Is Driven by a New Shared Cache

New features in AMD’s upcoming Barcelona chip dramatically boost performance of floating-point arithmetic and greatly accelerate access to cache.
» SSE128: AMD’s New Floating-Point Enhancements

Take advantage of the many architectural innovations in the "Barcelona" processor through Orcas-based tools and AMD libraries.
» Develop Blazing Fast Code with Microsoft Visual Studio® 2008 (code-named “Orcas”) and AMD Tools

AMD’s new chip architecture extends a long tradition of giving developers the features they need to execute their code blindingly fast. What's in it for you?
» Going to Barcelona: A Modern Architecture for Breakthrough Software Performance

AMD “Shanghai” (Family 10h) Processor Software Visible Features blog series

New “Istanbul” blogs
» “Shanghai” zone is now “Istanbul” zone
» “Istanbul” overview

Previous “Shanghai” blogs
» Transition from “Barcelona” to “Shanghai”
» Larger L3 Cache
» Improved Reliability, Availability, Scalability

Previous “Barcelona” blogs
» Welcome
» Shared L3 Cache
» CPUID
» Instruction-Based Sampling (IBS)
» MONITOR/MWAIT
» SSE Misaligned Access
» SSE4a Instruction Set, Part 1
» SSE4a Instruction Set, Part 2
» Sideband Stack Optimizer
» 128-bit FPU
» Advanced Bit Manipulation (ABM)

Benchmarks and Performance Evaluations
Virtualization
Shanghai-based Dell Systems take top scores for VMmark 8 core and 16 core systems.
» http://www.vmware.com/products/vmmark/results.html
This VMware performance white paper evaluating RVI performance with the Shanghai processor concludes that "the current VMware VMM leverages these features quite well, resulting in performance gains of up to 42% for MMU-intensive benchmarks and up to 500% for MMU-intensive microbenchmarks."
» http://www.vmware.com/resources/techresources/1079
HP ProLiant DL585 G5 earns #1 virtualization performance record on VMmark benchmark.
» ftp://ftp.compaq.com/pub/products/servers/benchmarks/
proliant_dl585_vmmark_080408.pdf

The very first independent Nested Paging Virtualization tests (2 socket servers running Xen with database and web serving workloads and featuring AMD-V (RVI)).
» http://www.anandtech.com/weblog/showpost.aspx?i=467

HPC
“Jaguar,” the AMD Opteron-based system by Cray at Oak Ridge National Labs, is the first entirely x86-based system to break the Petaflop barrier.
» http://www.marketwatch.com/news/story/Cray-Supercomputer-Oak-Ridge-Smashes/
story.aspx?guid=%7B25D20E9B-D6BD-4CA5-B7F6-3484D9616D7C%7D
Web Serving
HP ProLiant DL585 G5 and DL385 G5 AMD Opteron servers lead with 4P, 2P world record performances on the SPECweb®2005 Benchmark.
» ftp://ftp.compaq.com/pub/products/servers/benchmarks/
hp_proliant_dl585_385_specweb2006_073008.pdf
(Please note that Dual-Core AMD Opteron processors also hold the SPECWeb2005 performance records for 2P and 4P servers.)

Database
An 8 socket Shanghai-based HP system achieves the top x86-based score with Oracle and a 2 socket Shanghai-based HP system achieves the top x86-based score with SQL Server 2005.
» http://www.sap.com/solutions/benchmark/sd2tier.epx
AnandTech is "quite surprised that Shanghai was able to meet and, in some cases, pass Harpertown at various workload levels in some of the benchmarks."
» http://www.anandtech.com/showdoc.aspx?i=3456&p=7

HP ProLiant DL585 G5 with Quad-Core AMD Opteron processors takes #1 4-socket worldwide price/performance record again on TPC-C benchmark.
» ftp://ftp.compaq.com/pub/products/servers/benchmarks/
hp_proliant%20dl585_tpc_080208.pdf

HP ProLiant DL785 G5 achieves #1 8P non-clustered performance and price/performance on TPC-H@300GB benchmark.
» ftp://ftp.compaq.com/pub/products/servers/benchmarks/
dl785g5-tpch300gb-0708.pdf

Business Applications
HP ProLiant BL465c G5 server blade posts HP’s first Quad-Core AMD Opteron™ blade result on Oracle Applications Standard Benchmark (small model, single DB instance).
» ftp://ftp.compaq.com/pub/products/servers/benchmarks/
hp_proliant_bl460c%20_siebel_perf_brief_051408.pdf

HP ProLiant DL585 G5 achieves #1 4-processor Windows result on two-tier SAP® Sales and Distribution Standard Application Benchmark.
» ftp://ftp.compaq.com/pub/products/servers/benchmarks/
dl585g5_2tsapsd_071408.pdf

HP ProLiant DL785 G5 takes #1 8-processor Windows result with new Quad-Core AMD Opteron™ processors on two-tier SAP® Sales and Distribution Standard Application Benchmark.
» ftp://ftp.compaq.com/pub/products/servers/benchmarks/
dl785g5_2tsapsd_may08.pdf

HP ProLiant servers show excellent performance scalability with new Quad-Core AMD Opteron processors on two-tier SAP® Sales and Distribution (SD) Standard Application Benchmark (2 socket and 4 socket blades and servers).
» ftp://ftp.compaq.com/pub/products/servers/benchmarks/
HP_ProLiant_DL385_BL685c_2tSAPSD_March2708.pdf

Java Application Serving
Quad-Core AMD Opteron processor-based Sun X4600 server sets x86 SPECjbb2005 world record (8 socket server).
» http://www.sun.com/aboutsun/pr/2008-08/sunflash.20080807.1.xml

Floating Point Performance
HP ProLiant DL585 G5 server with latest Quad-Core AMD Opteron™ processors takes overall x86_64 records on SPEC® CPU2006 benchmark.
» ftp://ftp.compaq.com/pub/products/servers/benchmarks/
dl585_g5_speccpu2006_july08.pdf
Related Resources