Getting cpu cycles using RDTSC - why does the value of RDTSC always increase?

Asked
Active3 hr before
Viewed126 times

5 Answers

usinggetting
90%

Evaluation of points in a contest , Yes, it is referring to the absolute number of cpu cycles. – Gunther Piez Dec 22 '11 at 11:35 ,I want to get the CPU cycles at a specific point. I use this function at that point:, You can use it to measure cycles. Just make sure you run the cpu at 100% by loading it with work. – Johan Mar 5 '16 at 22:15

The problem with RDTSC is that you have no guarantee that it starts at the same point in time on all cores of an elderly multicore CPU and no guarantee that it starts at the same point in time time on all CPUs on an elderly multi-CPU board.
Modern systems usually do not have such problems, but the problem can also be worked around on older systems by setting a thread's affinity so it only runs on one CPU. This is not good for application performance, so one should not generally do it, but for measuring ticks, it's just fine.

(Another "problem" is that many people use RDTSC for measuring time, which is not what it does, but you wrote that you want CPU cycles, so that is fine. If you do use RDTSC to measure time, you may have surprises when power saving or hyperboost or whatever the multitude of frequency-changing techniques are called kicks in. For actual time, the clock_gettime syscall is surprisingly good under Linux.)

clock_gettime
load more v
88%

As long as your thread stays on the same CPU core, the RDTSC instruction will keep returning an increasing number until it wraps around. For a 2GHz CPU, this happens after 292 years, so it is not a real issue. You probably won't see it happen. If you expect to live that long, make sure your computer reboots, say, every 50 years.,People that were trying to use it wrong from the start (people who used it to measure time and not cycles) complained a lot, and eventually convinced CPU manufacturers to standardise on making the TSC measure time and not cycles.,There's lots of confusing and/or wrong information about the TSC out there, so I thought I'd try to clear some of it up.,I would just write rdtsc inside the asm statement, which works just fine for me and is more readable than some obscure hex code. Assuming it's the correct hex code (and since it neither crashes and returns an ever-increasing number, it seems so), your code is good.

The problem with RDTSC is that you have no guarantee that it starts at the same point in time on all cores of an elderly multicore CPU and no guarantee that it starts at the same point in time time on all CPUs on an elderly multi-CPU board.
Modern systems usually do not have such problems, but the problem can also be worked around on older systems by setting a thread's affinity so it only runs on one CPU. This is not good for application performance, so one should not generally do it, but for measuring ticks, it's just fine.

(Another "problem" is that many people use RDTSC for measuring time, which is not what it does, but you wrote that you want CPU cycles, so that is fine. If you do use RDTSC to measure time, you may have surprises when power saving or hyperboost or whatever the multitude of frequency-changing techniques are called kicks in. For actual time, the clock_gettime syscall is surprisingly good under Linux.)

clock_gettime
load more v
72%

Intel processor families increment the time-stamp counter differently:[5] ,Other processors also have registers which count CPU clock cycles, but with different names. For instance, on the AVR32, it is called the Performance Clock Counter (PCCNT) register. SPARC V9 provides the TICK register. PowerPC provides the 64-bit TBR register. ,The Time Stamp Counter (TSC) is a 64-bit register present on all x86 processors since the Pentium. It counts the number of CPU cycles since its reset. The instruction RDTSC returns the TSC in EDX:EAX. In x86-64 mode, RDTSC also clears the upper 32 bits of RAX and RDX. Its opcode is 0F 31.[1] Pentium competitors such as the Cyrix 6x86 did not always have a TSC and may consider RDTSC an illegal instruction. Cyrix included a Time Stamp Counter in their MII. ,[1] - Very simple C code to read the timer on an x86 machine. This reads the 64-bit value into two 32-bit integers and combines them - using just one 64-bit integer is another option.[clarification needed]

The Time Stamp Counter (TSC) is a 64-bit register present on all x86 processors since the Pentium. It counts the number of CPU cycles since its reset. The instruction RDTSC returns the TSC in EDX:EAX. In x86-64 mode, RDTSC also clears the upper 32 bits of RAX and RDX. Its opcode is 0F 31.[1] Pentium competitors such as the Cyrix 6x86 did not always have a TSC and may consider RDTSC an illegal instruction. Cyrix included a Time Stamp Counter in their MII.

RDTSC

The Time Stamp Counter (TSC) is a 64-bit register present on all x86 processors since the Pentium. It counts the number of CPU cycles since its reset. The instruction RDTSC returns the TSC in EDX:EAX. In x86-64 mode, RDTSC also clears the upper 32 bits of RAX and RDX. Its opcode is 0F 31.[1] Pentium competitors such as the Cyrix 6x86 did not always have a TSC and may consider RDTSC an illegal instruction. Cyrix included a Time Stamp Counter in their MII.

RDTSC

The Time Stamp Counter (TSC) is a 64-bit register present on all x86 processors since the Pentium. It counts the number of CPU cycles since its reset. The instruction RDTSC returns the TSC in EDX:EAX. In x86-64 mode, RDTSC also clears the upper 32 bits of RAX and RDX. Its opcode is 0F 31.[1] Pentium competitors such as the Cyrix 6x86 did not always have a TSC and may consider RDTSC an illegal instruction. Cyrix included a Time Stamp Counter in their MII.

0 F 31

The Time Stamp Counter (TSC) is a 64-bit register present on all x86 processors since the Pentium. It counts the number of CPU cycles since its reset. The instruction RDTSC returns the TSC in EDX:EAX. In x86-64 mode, RDTSC also clears the upper 32 bits of RAX and RDX. Its opcode is 0F 31.[1] Pentium competitors such as the Cyrix 6x86 did not always have a TSC and may consider RDTSC an illegal instruction. Cyrix included a Time Stamp Counter in their MII.

RDTSC
load more v
65%

Getting cpu cycles using RDTSC - why does the value of RDTSC always increase?,Starting from GCC 4.5 and later, the __rdtsc() intrinsic is now supported by both MSVC and GCC.,tsc - the TSC exists and rdtsc is supported. Baseline for x86-64.,rdtscp - rdtscp is supported.

Found this function but cannot get VS2010 to recognise the assembler. Do I need to include anything? (I believe I have to swap uint64_t to long long for windows....?)

static inline uint64_t get_cycles() {
   uint64_t t;
   __asm volatile("rdtsc": "=A"(t));
   return t;
}
load more v
75%

Let's take reading the cycle counter as an example. From Paoloni's paper, here is code for taking the start and end timestamps using "rdtscp": , On Linux, you can use objdump to disassemble your code, and on Windows you can use dumpbin. Or, you can load your executable into a debugger and use it to show the instructions in the code you are measuring. , Fortunately, modern CPUs make it easy to read the CPU cycle counter. On the x86, "rdtsc" and "rdtscp" are the instructions you want to use. You will find many examples and suggestions for using these instructions when searching the Web. However, a good tutorial is an Intel whitepaper by Gabriele Paoloni: How to Benchmark Code Execution Times on Intel IA-32 and IA-64 Instruction Set Architectures (September 2010) , Note: These issues only matter when measuring fine-grained operations. When measuring I/O devices, for instance, they do not hurt, but accounting for out-of-order execution of "rdtsc(p)" and adjacent instructions will be vastly dwarfed by the time of what you are measuring.

flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid
load more v

Other "using-getting" queries related to "Getting cpu cycles using RDTSC - why does the value of RDTSC always increase?"