How did reading and writing skills evolve in humans?Derek Hodgson has proposed a new idea about the evolution of the use of signs and the use of written and spoken language in early human societies.
How did reading and writing skills evolve in humans?
In this article we’re going to answer this question that how did reading and writing skills evolve in humans? The part of the brain called the visual cortex, which is responsible for processing visual information, evolved over millions of years in a world where there was no reading or writing. So how language skills emerged around 5,000 years ago and how our brains suddenly acquired the special ability to understand letters has long been a mystery. Some researchers believe that the key to understanding this transition is determining how and why humans first began to develop repetitive markings. Imaging the visual cortex of people reading a text has provided important insight into how the brain perceives simple patterns.
Derek Hodgson, in a new article published in the Journal of Archaeological Science Reports, has argued that the earliest patterns of human construction were aesthetic rather than symbolic, and explains what this means for the evolution of reading and writing. Archaeologists have discovered a large number of ancient patterns carved by early humans as well as Neanderthals and Homo sapiens. These signs were created thousands of years before the appearance of the first performing arts (drawing things that represent something).
Early markings: Top, left to right: Trinil shell (Drandonzea), two from Blombos Cave carvings in Africa, Middle: South Africa on ostrich eggshell, Bottom: Gibraltar by Neanderthals on stone surface
Such works have been discovered in South Africa with engravings that are about a hundred thousand years old. Archaeologists have found seashells carved by upright humans dating back 540,000 years. One of the curious observations of these early signs is that they all use grid-like signs, angles, and repeating lines.
Derek Hodgson proposed in 2001 that the way information is processed by the primary visual cortex in the brain has given rise to the ability to engrave simple patterns. This region of the brain contains neurons coding for shapes such as edges, lines, and T-junctions. As summarized shapes, these shapes preferentially activate the visual cortex. It is easy to understand how this happens. Lines, angles, and intersections are the most abundant symbols embedded in the natural environment. They are the first important clues to the layout of objects. Our brain’s ability to process them is shared by other primates, but the human brain is also able to effectively respond to these cues using Gestalt principles. Gestalt principles are rules that enable the mind to automatically perceive patterns in a stimulus. When these basic shapes are created, the higher-order visual areas of the brain process these shapes, and the brain can visualize them as real objects.
Around 700,000 years ago, this sensitivity to geometry and understanding of patterns enabled humans to begin making tools that had a certain symmetry. This work was probably not possible without having some kind of understanding of geometry. Making tools made us more sensitive and inclined towards patterns in the natural environment, so our ancestors implemented these patterns in materials other than real tools. For example, they randomly started making marks on rocks, shells, and materials like other soil.
At some point in time, these unintentional patterns were intentionally copied onto such materials and developed into designs and then into writing. But how was this possible? Research conducted in the field of neuroscience has shown that the premotor cortex of the brain, which guides manual skills, plays a role in the process of writing a text.
According to this theory, reading and writing evolved when our passive understanding interacted with our manual skills to understand things.
Blombos cave carving in South Africa, about 77 thousand years old
Abstract and written patterns also activate the brain’s mirror neurons. These brain cells are interesting because they are fired both when we act and when we see others act, helping us to perceive what others are doing as if we were doing it ourselves. These neurons play a role in imitating the actions of others. But these neurons are also activated when we see patterns and written texts. This creates a sense of recognition about a pattern (random or natural) in a way that motivates us to copy it, and these signs may have been the first steps toward reading and writing.
Ultimately, these developments enabled the brain to use the visual cortex for an entirely new purpose. This process may have finally caused an area related to the visual understanding of words to emerge in the brain and was able to gradually communicate with the areas related to speaking.
Hodgson also notes that some scholars believe that early signs were symbolic (rather than aesthetic) and that writing evolved from encoding information in them. However, in his opinion, such a thing is unlikely. The initial symptoms have been similar for a long time. If these signs were symbolic, we might expect them to vary widely over time and space, just as we see in modern writing systems, but this is not the case.
According to Hodgson, all these evidence point to the possibility that the first signs were signs related to aesthetics, which resulted from the prioritization of the visual cortex for primitive forms, and this could be in the period of upright humans, from about 1.8 million to 500 thousand years ago. has started
How does the central processing unit or CPU, which manages the execution of all instructions and is often referred to as the brain of the computer, work and what are its components?
What is CPU; Everything you need to know about processors
The central processing unit ( CPU ) is considered a vital element in any computer and manages all calculations and instructions that are transferred to other computer components and its peripheral equipment. Almost all electronic devices and gadgets you use; From desktops and laptops and phones to gaming consoles and smartwatches, everyone is equipped with a central processing unit; In fact, this unit is considered basic for computers, without which the system will not turn on, let alone be usable. The high speed of the central processing unit is a function of the input command, and the components of the computer only gain executive power if they are connected to this unit.
Table of contents
What is a processor?
Processor performance
Operating units of processors
Processor architecture
Set of instructions
RISC vs. CISC or ARM vs. x86
A brief history of processor architecture
ARM and X86-64 architecture differences
Processor performance indicators
Processor frequency
cache memory
Processing cores
Difference between single-core and multi-core processing
Processing threads
What is hypertrading or SMT?
CPU in gaming
What is a bottleneck?
Setting up a balanced system
Since the central processing units manage the data of all parts of the computer at the same time, it may work slowly as the volume of calculations and processes increases, or even fail or crash as the workload increases. Today, the most common central processing units on the market consist of semiconductor components on integrated circuits, which are sold in various types, and the leading manufacturers in this industry are AMD and Intel, who have been competing in this field since 50 years ago.
What is a processor?
To get to know the central processing unit (CPU), we first introduce a part of the computer called SoC very briefly. SoC, or system on a chip, is a part of a system that integrates all the components a computer needs for processing on a silicon chip. The SoC has various modules, of which the central processing unit (abbreviated as CPU) is the main component, and the GPU, memory, USB controller, power management circuits, and wireless radios (WiFi, 3G, 4G LTE, etc.) are miscellaneous components that may be necessary. not exist on the SoC. The central processing unit, which from now on and in this article will be called the processor for short, cannot process instructions independently of other chips; But building a complete computer is only possible with SoC.
The SoC is slightly larger than the CPU, yet offers much more functionality. In fact, despite the great emphasis placed on the technology and performance of the processor, this part of the computer is not a computer in itself, and it can finally be introduced as a very fast calculator that is part of the system on a chip or SoC; It retrieves data from memory and then performs some kind of arithmetic (addition, multiplication) or logical (and, or, not) operation on it.
Processor performance
Intel vs AMD CPU comparison
Comparison of Intel and AMD CPUs; All technical specifications and features
The process of processing instructions in the processor includes four main steps that are executed in order:
Calling or retrieving instructions from memory (Fetch): The processor first receives these instructions from memory in order to know how to manage the input and know the instructions related to it. This input may be one or infinitely many commands that must be addressed in separate locations. For this purpose, there is a unit called PC (abbreviation of Program Counter) or program counter, which maintains the order of sent commands; The processor is also constantly communicating with RAM in a cooperative interaction to find the address of the instruction (reading from memory).
Decoding or translation of instructions (Decode): Instructions are translated into a form that can be understood by the processor (machine language or binary). After receiving the commands, the processor needs to translate these codes into machine language (or binary) to understand them. Writing programs in binary language, from the very beginning, is a difficult task, and for this reason, codes are written in simpler programming languages, and then a unit called Assembler converts these commands into executable codes ready for processor processing.
Processing or execution of translated instructions (Execute): The most important step in the processor’s performance is the processing and execution of instructions. At this stage, the decoded and binary instructions are processed at a special address for execution with the help of the ALU unit (abbreviation of Arithmetic & Logic Unit) or calculation and logic unit.
Storage of execution results (Store): The results and output of instructions are stored in the peripheral memory of the processor with the help of the Register unit, so that they can be referred to in future instructions to increase speed (writing to memory).
The process described above is called a fetch-execute cycle, and it happens millions of times per second; Each time after the completion of these four main steps, it is the turn of the next command and all steps are executed again from the beginning until all the instructions are processed.
Operating units of processors
Each processor consists of three operational units that play a role in the process of processing instructions:
Arithmetic & Logic Unit (ALU): This is a complex digital circuit unit that performs arithmetic and comparison operations; In some processors, the ALU is divided into two sections, AU (for performing arithmetic operations) and LU (for performing logical operations).
Memory Control Unit (CU or Program Counter): This is a circuit unit that directs and manages operations within the processor and dictates how to respond to instructions to the calculation and logic unit and input and output devices. The operation of the control unit in each processor can be different depending on its design architecture.
Register unit (Register): The register unit is a unit in the processor that is responsible for temporarily storing processed data, instructions, addresses, sequence of bits, and output, and must have sufficient capacity to store these data. Processors with 64-bit architecture have registers with 64-bit capacity, and processors with 32-bit architecture have 32-bit registers.
Processor architecture
The relationship between the instructions and the processor hardware design forms the processor architecture; But what is 64 or 32-bit architecture? What are the differences between these two architectures? To answer this question, we must first familiarize ourselves with the set of instructions and how to perform their calculations:
Set of instructions
An instruction set is a set of operations that any processor can execute naturally. This operation consists of several thousands of simple and elementary instructions (such as addition, multiplication, transfer, etc.) whose execution is defined in advance for the processor, and if the operation is outside the scope of this set of instructions, the processor cannot execute it.
As mentioned, the processor is responsible for executing programs. These programs are a set of instructions written in a programming language that must be followed in a logical order and exactly step-by-step execution.
Related articles:
What is the difference between mobile, laptop, desktop, and server processors?
Since computers do not understand programming languages directly, these instructions must be translated into a machine language or binary form that is easier for computers to understand. The binary form consists of only two numbers zero and one and shows the two possible states of on (one) or off (zero) transistors for the passage of electricity.
In fact, each processor can be considered a set of electrical circuits that provide a set of instructions to the processor, and then the circuits related to that operation are activated by an electrical signal and the processor executes it.
Instructions consist of a certain number of bits. For example, in an 8-bit instruction; Its first 4 bits refer to the operation code and the next 4 bits refer to the data to be used. The length of an instruction set can vary from a few bits to several hundreds of bits and in some architectures it has different lengths.
In general, the set of instructions is divided into the following two main categories:
Computer calculations with a reduced instruction set (Reduced instruction set computer): For a RISC-based processor (read risk), the set of defined operations is simple and basic. These types of calculations perform processes faster and more efficiently and are optimized to reduce execution time; RISC does not need to have complex circuits and its design cost is low. RISC-based processors complete each instruction in a single cycle and only operate on data stored in registers; So, they are simple instructions, they have a higher frequency, the information routing structure in them is more optimal, and they load and store operations on registers.
Complex instruction set computer: CISC processors have an additional layer of microcode or microprogramming in which they convert complex instructions into simple instructions (such as addition or multiplication). Programmable instructions are stored in fast memory and can be updated. In this type of instruction set, a larger number of instructions can be included than in RICS, and their format can be of variable length. In fact, CISC is almost the opposite of RISC. CISC instructions can span multiple processor cycles, and data routing is not as efficient as RISC processors. In general, CISC-based processors can perform multiple operations during a single complex instruction, but they take multiple cycles along the way.
RISC vs. CISC or ARM vs. x86
RISC and CISC are the two starting and ending points of this spectrum in the instruction set category, and various other combinations are also visible. First, let’s state the basic differences between RISC and CISC:
RICS or Reduced Code of Practice
CISC or Complex Instruction Set
RISC instruction sets are simple; They perform only one operation and the processor can process them in one cycle.
CISC instructions perform multiple operations, but the processor cannot process them in a single cycle.
RISC-based processors have more optimized and simpler information routing; The design of these commands is so simple that they can be implemented in parts.
CISC-based processors are complex in nature, and instructions are more difficult to execute.
RISC-based processors require stored data to execute instructions.
In CISC-based processors, it is possible to work with instructions directly through RAM, and there is no need to load operations separately.
RISC does not require complex hardware and all operations are performed by software.
CISC design hardware requirements are higher. CISC instructions are implemented using hardware, and software is often simpler than RISC. This is why programs based on the CISC design require less coding and the instructions themselves do a large part of the operation.
As mentioned, in the design of today’s modern processors, a combination of these two sets (CISC or RISC) is used. For example, AMD’s x86 architecture originally uses the CISC instruction set, but is also equipped with microcode to simplify complex RISC-like instructions. Now that we have explained the differences between the two main categories of instruction sets, we will examine their application in processor architecture.
If you pay attention to the processor architecture when choosing a phone or tablet, you will notice that some models use Intel processors, while others are based on ARM architecture.
Suppose that different processors each have different instruction sets, in which case each must be compiled separately for each processor to run different programs. For example, for each processor from the AMD family, it was necessary to develop a separate Windows or thousands of versions of the Photoshop program were written for different processors. For this reason, standard architectures based on RISC or CISC categories or a combination of the two were designed and the specifications of these standards were made available to everyone. ARM, PowerPC, x86-64, and IA-64 are examples of these architecture standards, and below we introduce two of the most important ones and their differences:
A brief history of processor architecture
In 1823, a person named Baron Jones Jacob Berzelius discovered the chemical element silicon (symbol Si, atomic number 14) for the first time. Due to its abundance and strong semiconductor properties, this element is used as the main material in making processors and computer chips. Almost a century later, in 1947, John Bardeen , Walter Brattin and William Shockley invented the first transistor at Bell Labs and received the Nobel Prize.
The first efficient integrated circuit (IC) was unveiled in September 1958, and two years later IBM developed the first automated mass production facility for transistors in New York. Intel was founded in 1968 and AMD was founded a year later.
The first processor was invented by Intel in the early 1970s; This processor was called Intel 4004 and with the benefit of 2,300 transistors, it performed 60,000 operations per second. The Intel 4004 CPU was priced at 200 and had only 640 bytes of memory:
Intel CPU C4004 P0339
After Intel, Motorola introduced its first 8-bit processor (the MC6800) with a frequency of one to two MHz, and then MOS Technology introduced a faster and cheaper processor than the existing processors used in gaming consoles of the time, namely the Atari 2600 and Nintendo systems. Used like Apple II and Commodore 64. The first 32-bit processor was developed by Motorola in 1979, although this processor was only used in Apple’s Macintosh and Amiga computers. A little later, National Semiconductor released the first 32-bit processor for public use.
In 1993, PowerPC released its first processor based on a 32-bit instruction set; This processor was developed by the AIM consortium (consisting of three companies Apple, IBM, and Motorola) and Apple migrated from Intel to PowerPC at that time.
The difference between 32-bit and 64-bit processor (x86 vs. x64): Simply put, the x86 architecture refers to a family of instructions that was used in one of the most successful Intel processors, the 8086, and if a processor is compatible with the x86 architecture, that processor known as x86-64 or x86-32 for Windows 32 (and 16) versions bit is used; 64-bit processors are called x64 and 32-bit processors are called x86.
The biggest difference between 32-bit and 64-bit processors is their different access to RAM:
The maximum physical memory of x86 architecture or 32-bit processors is limited to 4 GB; While x64 architecture (or 64-bit processors) can access physical memory of 8, 16, and sometimes even up to 32 GB. A 64-bit computer can run both 32-bit and 64-bit programs; In contrast, a 32-bit computer can only run 32-bit programs.
In most cases, 64-bit processors are more efficient than 32-bit processors when processing large amounts of data. To find out which programs your operating system supports (32-bit or 64-bit), just follow one of the following two paths:
Press the Win + X keys to bring up the context menu and then click System. -> In the window that opens, find the System type section in the Device specification section. You can see whether your Windows is 64-bit or 32-bit from this section.
Type the term msinfo32 in the Windows search box and click on the displayed System Information. -> From the System Information section on the right, find the System type and see if your Windows operating system is based on x64 or X32.
The first route
The second path
ARM was a type of computer processor architecture that was introduced by Acorn in 1980; Before ARM, AMD, and Intel both used Intel’s X86 architecture, based on CISC computing, and IBM also used RISC computing in its workstations. In fact, Acorn was the first company to develop a home computer based on RISC computing, and its architecture was named after ARM itself: Acorn RISC Machine. The company did not manufacture processors and instead sold licenses to use the ARM architecture to other processor manufacturers. Acorn Holding changed the name Acorn to Advanced a few years later.
The ARM architecture processes 32-bit instructions, and the core of a processor based on this architecture requires at least 35,000 transistors. Processors designed based on Intel’s x86 architecture, which processes based on CISC calculations, require at least millions of transistors; In fact, the optimal energy consumption in ARM-based processors and their suitability for devices such as phones or tablets is related to the low number of transistors compared to Intel’s X86 architecture.
In 2011, ARM introduced the ARMv8 architecture with support for 64-bit instructions and a year after that, Microsoft also launched a Windows version compatible with the ARM architecture along with the Surface RT tablet.
ARM and X86-64 architecture differences
The ARM architecture is designed to be as simple as possible while keeping power dissipation to a minimum. On the other hand, Intel uses more complex settings with the X86 architecture, which is more suitable for more powerful desktop and laptop processors.
Computers moved to 64-bit architecture after Intel introduced the modern x86-64 architecture (also known as x64). 64-bit architecture is essential for optimal calculations and performs 3D rendering and encryption with greater accuracy and speed. Today, both architectures support 64-bit instructions, but this technology came earlier for mobile.
When ARM implemented 64-bit architecture in ARMv8, it took two approaches to this architecture: AArch32 and AArch64. The first one is used to run 32-bit codes and the other one is used to run 64-bit codes.
ARM architecture is designed in such a way that it can switch between two modes very quickly. This means that the 64-bit instruction decoder no longer needs to be compatible with 32-bit instructions and is designed to be backward compatible, although ARM has announced that processors based on the ARMv9 Cortex-A architecture will only be compatible with 64-bit instructions in 2023. and support for 32-bit applications and operating systems will end in next-generation processors.
The differences between ARM and Intel architecture largely reflect the achievements and challenges of these two companies. The approach of optimal energy consumption in the ARM architecture, while being suitable for power consumption under 5 watts in mobile phones, provides the possibility of improving the performance of processors based on this architecture to the level of Intel laptop processors. Compared to Intel’s 100-watt power consumption in Core i7 and Core i9 processors or even AMD processors, it is a great achievement in high-end desktops and servers, although historically it is not possible to lower this power below 5 watts.
Processors that use more advanced transistors consume less power, and Intel has long been trying to upgrade its lithography from 14nm to more advanced lithography. The company recently succeeded in producing its processors with the 10nm manufacturing process, but in the meantime, mobile processors have also moved from 20nm to 14nm, 10nm, and 7nm designs, which is a result of competition from Samsung and TSMC. On the other hand, AMD unveiled 7nm processors in the Ryzen series and surpassed its x86-64 architecture competitors.
Nanometer: A meter divided by a thousand is equal to a millimeter, a millimeter divided by a thousand is equal to a micrometer, and a micrometer divided by a thousand is equal to a nanometer, in other words, a nanometer is a billion times smaller than a meter.
Lithography or manufacturing process: lithography is a Greek word that means lithography, which refers to the way components are placed in processors, or the process of producing and forming circuits; This process is carried out by specialized manufacturers in this field, such as TSMC. In lithography, since the production of the first processors until a few years ago, nanometers showed the distances of placing processor components together; For example, the 14nm lithography of the Skylake series processors in 2015 meant that the components of that processor were separated by 14nm. At that time, it was believed that the less lithography or processor manufacturing process, the more efficient energy consumption and better performance.
The distance between the placement of components in processors is not so relevant nowadays and the processes used to make these products are more contractual; Because it is no longer possible to reduce these distances beyond a certain limit without reducing productivity. In general, with the passage of time, the advancement of technology, the design of different transistors, and the increase in the number of these transistors in the processor, manufacturers have adopted various other solutions such as 3D stacking to place transistors on the processors.
The most unique feature of ARM architecture can be considered as keeping the power consumption low in running mobile applications; This achievement comes from ARM’s heterogeneous processing capability; ARM architecture allows processing to be divided between powerful and low-power cores, and as a result, energy is used more efficiently.
ARM’s first attempt in this field dates back to the big.LITTLE architecture in 2011, when the large Cortex-A15 cores and the small Cortex-A7 cores arrived. The idea of using powerful cores for heavy applications and using low-power cores for light and background processing may not have been given as much attention as it should be, but ARM experienced many unsuccessful attempts and failures to achieve it; Today, ARM is the dominant architecture in the market: for example, iPads and iPhones use ARM architecture exclusively.
In the meantime, Intel’s Atom processors, which did not benefit from heterogeneous processing, could not compete with the performance and optimal consumption of processors based on ARM architecture, and this made Intel lag behind ARM.
Finally, in 2020, Intel was able to use a hybrid architecture for cores with a powerful core (Sunny Cove) and four low-consumption cores (Tremont) in the design of its 10 nm Lakefield processors, and in addition to this achievement, it also uses graphics and connectivity capabilities. , but this product was made for laptops with a power consumption of 7 watts, which is still considered high consumption for phones.
Another important distinction between Intel and ARM is in the way they use their design. Intel uses its developed architecture in the processors it manufactures and sells the architecture in its products, while ARM sells its design and architecture certification with customization capabilities to other companies, such as Apple, Samsung, and Qualcomm, and these companies They can make changes in the set of instructions of this architecture and design depending on their goals.
Manufacturing custom processors is expensive and complicated for companies that manufacture these products, but if done right, the end products can be very powerful. For example, Apple has repeatedly proven that customizing the ARM architecture can bring the company’s processors to par with x84-64 or beyond.
Apple eventually plans to remove all Intel-based processors from its Mac products and replace them with ARM-based silicon. The M1 chip is Apple’s first attempt in this direction, which was released along with MacBook Air, MacBook Pro and Mac Mini. After that, the M1 Max and M1 Ultra chips also showed that the ARM architecture combined with Apple’s improvements could challenge the x86-64 architecture.
As mentioned earlier, standard architectures based on RISC or CISC categories or a combination of the two were designed and the specifications of these standards were made available to everyone; Applications and software must be compiled for the processor architecture on which they run. This issue was not a big concern before due to the limitations of different platforms and architectures, but today the number of applications that need different compilations to run on different platforms has increased.
ARM-based Macs, Google’s Chrome OS, and Microsoft’s Windows are all examples in today’s world that require software to run on both ARM and x86-64 architectures. Native software compilation is the only solution that can be used in such a situation.
In fact, for these platforms, it is possible to simulate each other’s code, and the code compiled for one architecture can be executed on another architecture. It goes without saying that such an approach to the initial development of an application compatible with any platform is accompanied by a decrease in performance, but the very possibility of simulating the code can be very promising for now.
After years of development, currently, the Windows emulator for a platform based on ARM architecture provides acceptable performance for running most applications, Android applications also run more or less satisfactorily on Chromebooks based on Intel architecture, and Apple, which has a special code translation tool for has developed itself (Rosetta 2) supports older Mac applications that were developed for the Intel architecture.
However, as mentioned, all three perform weaker in the implementation of programs than if the program was written from scratch for each platform separately. In general, the architecture of ARM and Intel X86-64 can be compared as follows:
architecture
ARM
X86-64
CISC vs. RISC
The ARM architecture is an architecture for processors and therefore does not have a single manufacturer. This technology is used in the processors of Android phones and iPhones.
The X86 architecture is produced by Intel and is exclusively used in desktop and laptop processors of this company.
Complexity of instructions
The ARM architecture uses only one cycle to execute an instruction, and this feature makes processors based on this architecture more suitable for devices that require simpler processing.
The Intel architecture (or the X86 architecture associated with 32-bit Windows applications) often uses CISC computing and therefore has a slightly more complex instruction set and requires several cycles to execute.
Mobile CPUs vs. Desktop CPUs
The dependence of the ARM architecture on the software makes this architecture be used more in the design of phone processors; ARM (in general) works better on smaller technologies that don’t have constant access to the power supply.
Because Intel’s X86 architecture relies more on hardware, this architecture is typically used to design processors for larger devices such as desktops; Intel focuses more on performance and is considered a better architecture for a wider range of technologies.
Energy consumption
The ARM architecture not only consumes less energy thanks to its single-cycle computing set but also has a lower operating temperature than Intel’s X86 architecture; ARM architectures are great for designing phone processors because they reduce the amount of energy required to keep the system running and execute the user’s requested commands.
Intel’s architecture is focused on performance, so it won’t be a problem for desktop or laptop users who have access to an unlimited power source.
Processor speed
CPUs based on ARM architecture are usually slower than their Intel counterparts because they perform calculations with lower power for optimal consumption.
Processors based on Intel’s X86 architecture are used for faster computing.
operating system
ARM architecture is more efficient in the design of Android phone processors and is considered the dominant architecture in this market; Although devices based on the X86 architecture can also run a full range of Android applications, these applications must be translated before running. This scenario requires time and energy, so battery life and overall processor performance may suffer.
Intel architecture reigns as the dominant architecture in tablets and Windows operating systems. Of course, in 2019, Microsoft released the Surface Pro X with a processor that uses ARM architecture and could run the full version of Windows. If you are a gamer or if you have expectations from your tablet beyond running the full version of Windows, it is better to still use the Intel architecture.
During the competition between Arm and x86 over the past ten years, ARM can be considered the winning architecture for low-power devices such as phones. This architecture has also made great strides in laptops and other devices that require optimal energy consumption. On the other hand, although Intel has lost the phone market, the efforts of this manufacturer to optimize energy consumption have been accompanied by significant improvements over the years, and with the development of hybrid architecture, such as the combination of Lakefield and Alder Lake, now more than ever, there are many commonalities with processors. It is based on Arm architecture. Arm and x86 are distinctly different from an engineering point of view, and each has its own individual strengths and weaknesses, however, today it is no longer easy to distinguish between the use cases of the two, as both architectures are increasingly supported. It is increasing in ecosystems.
Processor performance indicators
Processor performance has a great impact on the speed of loading programs and their smooth execution, and there are various measures to measure the performance of each processor, of which frequency (clock speed) is one of the most important. So be careful, the frequency of each core can be considered as a criterion for measuring its processing power, but this criterion does not necessarily represent the overall performance of the processor and many things such as the number of cores and threads, internal architecture (synergy between cores), cache memory capacity, Overclocking capability, thermal power, power consumption, IPC, etc. were also considered to judge the overall performance of the processor.
Synergy is an effect that results from the flow or interaction of two or more elements. If this effect is greater than the sum of the effects that each of those individual elements could create, then synergy has occurred.
In the following, we will explain more about the factors influencing the performance of the processor:
Processor frequency
One of the most important factors in choosing and buying a processor is its frequency (Clock Speed), which is usually a fixed number for all its cores. The number of operations that the processor performs per second is known as its speed and is expressed in Hertz, MHz (MHz for older processors), or GHz.
At the same frequency, a processor with a higher IPC can do more processing and is more powerful
More precisely, frequency refers to the number of computing cycles that processor cores perform per second and is measured in GHz (GHz-billion cycles per second).
For example, a 3.2 GHz processor performs 3.2 billion operations per second. In the early 1970s, processors passed the frequency of one megahertz (MHz) or running one million cycles per second, and around 2000 the gigahertz (GHz) unit of measurement equal to one billion hertz was chosen to measure their frequency.
Sometimes, multiple instructions are completed in one cycle, and in some cases, an instruction may be processed in multiple cycles. Since different architectures and designs of each processor perform instructions in a different way, the processing power of their cores can be different depending on the architecture. In fact, without knowing the number of instructions processed per cycle (IPC) comparing the frequency of two processors is completely meaningless.
Suppose we have two processors; One is produced by Company A and the other by Company B, and the frequency of both of them is the same and equal to one GHz. If we have no other information, we may consider these two processors to be the same in terms of performance; But if company A’s processor completes one instruction per cycle and company B’s processor can complete two instructions per cycle. Obviously, the second processor will perform faster than the A processor.
In simpler words, at the same frequency, a processor with a higher IPC can do more processing and is more powerful. So, to properly evaluate the performance of each processor, in addition to the frequency, you will also need the number of instructions it performs in each cycle.
Therefore, it is better to compare the frequency of each processor with the frequency of processors of the same series and generations with the same processor. It’s possible that a processor from five years ago with a high frequency will outperform a newer processor with a lower frequency because newer architectures handle instructions more efficiently.
Intel’s X-series processors may outperform higher-frequency K-series processors because they split tasks between more cores and have larger caches; On the other hand, in the same generation of processors, a processor with a higher frequency usually performs better than a processor with a lower frequency in many applications. This is why the manufacturer company and processor generation are very important when comparing processors.
Base frequency and boost frequency: The base frequency of any processor is the minimum frequency that the processor works with when idle or when performing light processing; on the other hand, the boost frequency is a measure that shows how much the processor performs when performing heavier calculations or more demanding processes. can increase. Boost frequencies are automatically applied and limited by heat from heavy processing before the processor reaches unsafe levels of computing.
In fact, it is not possible to increase the frequency of a processor without physical limitations (mainly electricity and heat), and when the frequency reaches about 3 GHz, the power consumption increases disproportionately.
Cache memory
Another factor that affects the performance of the processor is the capacity of the processor’s cache memory or RAM; This type of RAM works much faster than the main RAM of the system due to being located near the processor and the processor uses it to temporarily store data and reduce the time of transferring data to/from the system memory.
Related articles:
What is L2, L1, and L3 cache memory and what effect does it have on processor performance?
Therefore, cache can also have a large impact on processor performance; The more RAM the processor has, the better its performance will be. Fortunately, nowadays all users can access benchmark tools and evaluate the performance of processors themselves, regardless of manufacturers’ claims.
Cache memory can be multi-layered and is indicated by the letter L. Usually, processors have up to three or four layers of cache memory, the first layer (L1) is faster than the second layer (L2), the second layer is faster than the third layer (L3), and the third layer is faster than the fourth layer (L4). . The cache memory usually offers up to several tens of megabytes of space to store, and the more space there is, the higher the price of the processor will be.
The cache memory is responsible for maintaining data; This memory has a higher speed than the RAM of the computer and therefore reduces the delay in the execution of commands; In fact, the processor first checks the cache memory to access desired data, and if the desired data is not present in that memory, it goes to the RAM.
Level one cache memory (L1), which is called the first cache memory or internal cache; is the closest memory to the processor and has high speed and smaller volume than other levels of cache memory, this memory stores the most important data needed for processing; Because the processor, when processing an instruction, first of all goes to the level one cache memory.
Level two (L2) cache memory, which is called external cache memory, has a lower speed and a larger volume than L1, and depending on the processor structure, it may be used jointly or separately. Unlike L1, L2 was placed on the motherboard in old computers, but today, in new processors, this memory is placed on the processor itself and has less delay than the next layer of cache, namely L3.
The L3 cache memory is the memory that is shared by all the cores in the processor and has a larger capacity than the L1 or L2 cache memory, but it is slower in terms of speed.
Like L3, L4 cache has a larger volume and lower speed than L1 or L2; L3 or L4 are usually shared.
Processing cores
The core is the processing unit of the processor that can independently perform or process all computing tasks. From this point of view, the core can be considered as a small processor in the whole central processing unit. This part of the processor consists of the same operational units of calculation and logical operations (ALU), memory control (CU), and registers (Register) that perform the process of processing instructions with a fetch-execution cycle.
In the beginning, processors worked with only one core, but today processors are mostly multi-core, with at least two or more cores on an integrated circuit, processing two or more processes simultaneously. Note that each core can only execute one instruction at a time. Processors equipped with multiple cores execute sets of instructions or programs using parallel processing (Parallel Computing) faster than before. Of course, having more cores does not mean increasing the overall performance of the processor. Because many programs do not yet use parallel processing.
Single-core processors: The oldest type of processor is a single-core processor that can execute only one command at a time and is not efficient for multitasking. In this processor, the start of a process requires the end of the previous operation, and if more than one program is executed, the performance of the processor will decrease significantly. The performance of a single-core processor is calculated by measuring its power and based on frequency.
Dual-core processors: A dual-core processor consists of two strong cores and has the same performance as two single-core processors. The difference between this processor and a single-core processor is that it switches back and forth between a variable array of data streams, and if more threads or threads are running, a dual-core processor can handle multiple processing tasks more efficiently.
Quad-core processors: A quad-core processor is an optimized model of a multi-core processor that divides the workload between cores and provides more effective multitasking capabilities by benefiting from four cores; Hence, it is more suitable for gamers and professional users.
Six-core processors (Hexa-Core): Another type of multi-core processor is a six-core processor that performs processes at a higher speed than four-core and two-core types. For example, Intel’s Core i7 processors have six cores and are suitable for everyday use.
Octa-Core processors: Octa-core processors are developed with eight independent cores and offer better performance than previous types; These processors include a dual set of quad-core processors that divide different activities between different types. This means that in many cases, the minimum required cores are used for processing, and if there is an emergency or need, the other four cores are also used in performing calculations.
Ten-core processors (Deca-Core): Ten-core processors consist of ten independent systems that are more powerful than other processors in executing and managing processes. These processors are faster than other types and perform multitasking in the best possible way, and more and more of them are released to the market day by day.
Difference between single-core and multi-core processing
In general, it can be said that the choice between a powerful single-core processor and a multi-core processor with normal power depends only on the way of use, and there is no pre-written version for everyone. The powerful performance of single-core processors is important for use in software applications that do not need or cannot use multiple cores. Having more cores doesn’t necessarily mean faster, but if a program is optimized to use multiple cores, it will run faster with more cores. In general, if you mostly use applications that are optimized for single-core processing, you probably won’t benefit from a processor with a large number of cores.
Let’s say you want to take 2 people from point A to B, of course a Lamborghini will do just fine, but if you want to transport 50 people, a bus can be a faster solution than multiple Lamborghini commutes. The same goes for single-core versus multi-core processing.
In recent years and with the advancement of technology, processor cores have become increasingly smaller, and as a result, more cores can be placed on a processor chip, and the operating system and software must also be optimized to use more cores to divide instructions and execute them simultaneously. allocate different If this is done correctly, we will see an impressive performance.
multi-core
How do processors use multiple cores?
How do Windows and other operating systems use multiple cores in a processor?
In traditional multi-core processors, all cores were implemented the same and had the same performance and power rating. The problem with these processors was that when the processor is idle or doing light processing, it is not possible to lower the energy consumption beyond a certain limit. This issue is not a concern in conditions of unlimited access to power sources but can be problematic in conditions where the system relies on batteries or a limited power source for processing.
This is where the concept of asymmetric processor design was born. For smartphones, Intel quickly adopted a solution that some cores are more powerful and provide better performance, and some cores are implemented in a low-consumption way; These cores are only good for running background tasks or running basic applications such as reading and writing email or browsing the web.
High-powered cores automatically kick in when you launch a video game or when a heavy program needs more performance to do a specific task.
Although the combination of high-power and low-consumption cores in processors is not a new idea, using this combination in computers was not so common, at least until the release of the 12th generation Alder Lake processors by Intel.
In each model of Intel’s 12th generation processors, there are E cores (low consumption) and P cores (powerful); The ratio between these two types of cores can be different, but for example, in Alder Lake Core i9 series processors, eight cores are intended for heavy processing and eight cores for light processing. The i7 and i5 series have 8.4 and 6.4 designs for P and E cores, respectively.
There are many advantages to having a hybrid architecture approach in processor cores, and laptop users will benefit the most, because most daily tasks such as web browsing, etc., do not require intensive performance. If only low-power cores are involved, the computer or laptop will not heat up and the battery will last longer.
Low-power cores are simple and inexpensive to produce, so using them to boost and free up powerful, advanced cores seems like a smart idea.
Even if you have your system connected to a power source, the presence of low-power cores will be efficient. For example, if you are engaged in gaming and this process requires all the power of the processor, powerful cores can meet this need, and low-power cores are also responsible for running background processes or programs such as Skype, etc.
At least in the case of Intel’s Alder Lake processors, the P and E cores are designed to not interfere with each other so that each can perform tasks independently. Unfortunately, since combining different processors is a relatively new concept for x86 processors, this fundamental change in the x86 architecture is fraught with problems.
Before the idea of hybrid cores (or the combination of powerful cores or P and low consumption or E) was proposed, software developers had a reason to develop their products. They did not see a form compatible with this architecture, so their software was not aware of the difference between low-consumption and high-consumption cores, and this caused in some cases Reports of crashes or strange behavior of some software (such as Denuvo).
Processing threads
Processing threads are threads of instructions that are sent to the processor for processing; Each processor is normally capable of processing one instruction, which is called the main instruction, and if two instructions are sent to the processor, the second instruction is executed after the first instruction is executed. This process can slow down the speed and performance of the processor. In this regard, processor manufacturers divide each physical core into two virtual cores (Thread), each of which can execute a separate processing thread, and each core, having two threads, can execute two processing threads at the same time.
Active processing versus passive processing
Active processing refers to the process that requires the user to manually set data to complete an instruction; Common examples of active processing include motion design, 3D modeling, video editing, or gaming. In this type of processing, single-core performance and high-core speed are very important, so in the implementation of such processing, we need fewer, but more powerful, cores to benefit from smooth performance.
Passive processing, on the other hand, is instructions that can usually be easily executed in parallel and left alone, such as 3D rendering and video; Such processing requires processors with a large number of cores and a higher base frequency, such as AMD’s Threadripper series processors.
One of the influential factors in performing passive processing is the high number of threads and their ability to be used. In simple words, a thread is a set of data that is sent to the processor for processing from an application and allows the processor to perform several tasks at the same time in an efficient and fast way; In fact, it is because of the threads in the system that you can listen to music while surfing the web.
Threads are not physical components of the processor but represent the amount of processing that the processor cores can do, and to execute several very intensive instructions simultaneously, you will need a processor with a large number of threads.
The number of threads in each processor is directly related to the number of cores; In fact, each core can usually have two threads and all processors have active threads that allocate at least one thread to perform each process.
What is hypertrading or SMT?
Hyperthreading in Intel processors and simultaneous multithreading (SMT) in AMD processors are concepts to show the process of dividing physical cores into virtual cores; In fact, these two features are a solution for scheduling and executing instructions that are sent to the processor without interruption.
Today, most processors are equipped with hyperthreading or SMT capability and run two threads per core. However, some low-end processors, such as Intel’s Celeron series or AMD’s Ryzen 3 series, do not support this feature and only have one thread per core. Even some high-end Intel processors come with disabled hyperthreading for various reasons such as market segmentation, so it is generally better to read the Cores & Threads description section before buying any processor. Check it out.
Hyperthreading or simultaneous multithreading helps to schedule instructions more efficiently and use parts of the core that are currently inactive. At best, threads provide about 50% more performance compared to physical cores.
In general, if you’re only doing active processing like 3D modeling during the day, you probably won’t be using all of your CPU’s cores; Because this type of processing usually only runs on one or two cores, but for processing such as rendering that requires all the power of the processor cores and available threads, using hyperthreading or SMT can make a significant difference in performance.
CPU in gaming
Before the release of multi-core processors, computer games were developed for single-core systems, but after the introduction of the first dual-core processor in 2005 by AMD and the release of four, six and eight-core processors after that, there is no longer a limit to the help of more cores. did not have Because the ability to execute several different operations at the same time was provided for the processors.
In order to have a satisfactory gaming experience, every gamer must choose a balanced processor and graphics processor (we will examine the graphics processor and its function in a separate article) in a balanced way. If the processor has a weak or slow performance and cannot execute commands fast enough, the system graphics cannot use its maximum power; Of course, the opposite is also true. In such a situation, we say that the graphics has become a bottleneck.
What is a bottleneck?
In the field of computers, botlink (or bottleneck) is said to limit the performance of a component as a result of the difference in the maximum capabilities of two hardware components. Simply put, if the graphics unit receives instructions faster than the processor can send them, the unit will sit idle until the next set of instructions is ready, rendering fewer frames per second; In this situation, the level of graphics performance is limited due to processor limitations.
The same may happen in the opposite direction. If a powerful processor sends commands to it faster than the graphics unit can receive, the processor’s capabilities are limited by the poor performance of the graphics.
In fact, a system that consists of a suitable processor and graphics, provides a better and smoother performance to the user. Such a system is called a balanced system. In general, a balanced system is a system in which the hardware does not create bottlenecks (or bottlenecks) for the user’s desired processes and provides a better user experience without disproportionate use (too much or too little) of system components.
It is better to pay attention to a few points to set up a balanced system:
You can’t set up a balanced system for an ideal gaming experience by just buying the most expensive processor and graphics available in the market.
Butlink is not necessarily caused by the quality or oldness of the components and is directly related to the performance of the system hardware.
Graphics botlinking is not specific to advanced systems, and balance is also very important in systems with low-end hardware.
The creation of botlinks is not exclusive to the processor and graphics, but the interaction between these two components prevents this problem to a large extent.
Setting up a balanced system
In the case of gaming or graphics processing, when the graphics do not use their maximum power, the effect of processor power on improving the quality of the user’s gaming experience will be noticeable if there is high coordination between the graphics unit and the processor; In addition, the type and model of the game are also two important factors in choosing hardware. Currently, quad-core processors can still be used to run various games, but Hexa-core processors or more will definitely give you smoother performance. Today, multi-core processors for games such as first-person shooters (FPS) or online multiplayer games are considered essential for any gaming system.
Galaxy S24 FE, Samsung’s economic flagship, despite the significant increase in processing power, still has some of the weaknesses of the past.
Galaxy S24 FE review; Although insufficient steps in the path of evolution
Nowadays, it is rare to find anyone who is not somehow familiar with the Samsung Fan Edition series. Samsung’s FE series products are very popular thanks to their hardware offerings and build quality on par with flagship phones with a lower price tag. They are often referred to as flagship killers rather than fan-pleasers. On the other hand, the existence of strong competitors such as Xiaomi’s T series is a strong reason for the life of the FE family to continue.
Table of contents
Galaxy S24 FE review video
Galaxy S24 FE Design: Goodbye to the crowd
Galaxy S24 FE screen and speaker: accurate and attractive
S24 FE performance and charging: with the power of a full-fledged flagship
Galaxy S24 FE software: pseudo-flagship full of artificial intelligence
Galaxy S24 FE camera: more or less better and sometimes worse
Galaxy S24 FE camera comparison with S23 FE
Galaxy S24 FE; A product still involved with constant challenges
Artificial intelligence is considered to be the leading actor these days in the world of technology; Therefore, companies like Samsung have seized the opportunity to make the most of the capacity of the prevailing psychological space to be in the spotlight and use artificial intelligence as an important trump card against competitors.
Now it’s the turn of the FE series to inherit the artificial intelligence capabilities from their flagship brothers/sisters in addition to the hardware features; But not everything is artificial intelligence, and we should not forget that the FE series, with all its merits, is under the microscope; Because its previous generations have not left a very bright record in terms of chip performance stability and battery life. Has Samsung been able to think of a solution for this problem? We will find the answer to this question in the Galaxy S24 FE review.
The first big change that can be seen in the appearance of the S24 FE is not a change in the design language, but an increase in its size; Now the dimensions of the phone have increased from 6.5 to 6.7 inches, which may be good news for some and unpleasant for others; But since most of the phones have similar dimensions, maybe it would be better if Samsung did not touch the size of the phone to maintain the distinction and have a better chance to attract fans of compact phones.
With the increase in the dimensions of the phone, the screen-to-body ratio has reached 88%, which makes the wide margins of the previous generation less noticeable; I emphasize that they are less noticeable because the width of the borders around the screen in the Galaxy S24 FE does not change significantly compared to the previous generation, and you still have to get used to seeing them.
The Galaxy S24 FE’s screen protection has been improved after a major setback in the S23 FE with Gorilla Glass Victus+; But the back glass frame of the phone is still made of Gorilla Glass 5, and due to its shiny and transparent surface, it still has a high talent for absorbing fingerprints; However, it provides good friction with the hand and does not allow the phone to slide.
The curved and matte aluminum frame of the previous generation has been replaced by a flat frame, lest the Galaxy S24FA defy the design language of the Koreans this year; This change helps to maintain the stability of the phone in the hand; But it reduces the elegance of the phone and even with a 0.2mm thickness reduction, it looks rougher than the previous generation.
By increasing the size of the phone and using a larger battery, the weight of the S24 FE has increased by four grams and has reached 213 grams; Although the increase in the weight of the phone is not very noticeable; due to the increase in size, the device does not fit easily in the hand of the previous generation. The combination of these two things made me feel more like a mid-range phone than a flagship killer; Although, nowadays, Samsung phones from low-end to flagship have more or less the same appearance.
Galaxy S24 FE is produced in green, blue, yellow, silver, and gray (graphite) colors, relying on the IP68 standard, it can last for 30 minutes in 1.5 meters of water, on the bottom edge of the USB 3.2 Type-C port, slot Speaker and microphone holes and on the top edge hosts the SIM card port and secondary microphone holes and noise canceling. The power and volume up/down buttons are also placed on the right edge with the same arrangement as before.
Galaxy S24 FE screen and speaker: accurate and attractive
The Galaxy S24 FE uses the brand’s Dynamic AMOLED 2X OLED display, which renders content at FHD+ resolution, or 2340 x 1080 pixels, and a 120Hz refresh rate. Due to the 0.2-inch increase in screen size, the pixel density has been reduced to a very small amount that cannot be detected by the naked eye.
The Galaxy S24 FE panel is not LTPO and is low-power, and the refresh rate only moves between 60 and 120 Hz depending on the content; Therefore, the always-on display or AOD of the phone, unlike the S24 series, does not have the ability to display the background image and only displays the clock and notifications.
The S24 FE screen, like other Samsung phones, uses 8-bit color depth with the ability to display 16 million colors; Therefore, standards such as Dolby Vision are not supported, and you can only watch content with the HDR10+ standard; But the good news is that the brightness rate of the device when playing such content reaches more than 2240 nits based on the measurement of Zoomit; Of course, the maximum brightness under sunlight does not exceed 1695 nits, which is still 30% brighter than before.
The S24 FE display has two color profiles, Natural and Vivid, which are defined in the sRGB and DCI-P3 color spaces. From the S24 phones onwards, Samsung made changes to the Vivid profile that, according to its own words, shows colors more naturally; Therefore, by switching from the Natural profile to Vivid, by default, the colors do not recover the past freshness and wider coverage of the DCI-P3 color range, and we do not see any special change except the colors becoming colder.
Galaxy S24 FE display performance against the competition
Product/Test
Minimum brightness
Maximum brightness
contrast ratio
sRGB
DCI P3
manual
Automatic
local
cover
Average error
cover
Average error
Galaxy S24 FE
1.7
520
2240
(HDR)
∞
99.5 percent
(Natural)
1.4
94.3
percentage
(Vivid)
3.2
Galaxy S23 FE
1.9
607.5
1315
∞
98.7
percentage
(Natural)
1.5
100
percentage
(Vivid)
4
Galaxy S24 Plus
0.8
587
2965
(HDR)
∞
95.6 percent
(Natural)
3.2
83.3 percent
(Vivid)
4.6
Motorola Edge 50 Pro
2/6
640
2300
(HDR)
∞
100 percent
(Natural)
0.8
100 percent
(Vibrant)
1.2
Pixel 8
2
1228
2200
(HDR)
∞
97.9 percent
(Natural)
0.8
84.3 percent
(Adaptive)
2.8
To experience more vivid colors, you can use the slider in the Advanced Settings section to increase the color intensity. By increasing Vividness as much as possible, you can restore that past visual experience; However, a small part of the DCI-P3 range falls under the S24 FE and does not cover it 100%.
The Natural profile displays completely neutral colors with high accuracy; Of course, the accuracy of the Vivid profile is also very good with DeltaE 3.2; But you will see the same tendency to be cold in the display of colors by choosing this profile.
There is an optical fingerprint sensor under the display, which is slightly lower than the S23 FE and its performance is as fast and accurate as before; The speed and accuracy of the device’s optical sensor cannot be compared with ultrasonic sensors; However, you can rest assured that there is no problem with the fingerprint recognition on the protective glass.
The Galaxy S24 FE uses the speakerphone as a second speaker to produce stereo sound. Speakers provide high and balanced sound volume. As with the Galaxy S23 FE, sound separation is good as long as you don’t plan on playing loud music; However, the bass may not have that usual thump in some songs.
S24 FE performance and charging: with the power of a full-fledged flagship
The presence of Exynos chips in the heart of Fan Editions has become a tradition and Samsung has chosen to follow the same procedure this year. Now that after a gap of one year, the basic and plus models of the S24 series have also been launched with the Exynos chip, Samsung is equipping it with the Exynos 2400e chip to differentiate between the flagship series and the fan edition. In addition, unlike last year, there is no news about the Snapdragon version and all S24 FE models are launched with the Samsung chip.
Technical specifications of the chip
Specifications/Chip
Exynos 2400
Exynos 2400e
Exynos 2200
Central processor
1 3.21 GHz Cortex-X4 core
2 2.9 GHz Cortex-A720 cores
3 2.6 GHz Cortex-A720 cores
4 cores of 1.95 GHz Cortex-A520
8 MB system cache
1 3.11 GHz Cortex-X4 core
2 2.9 GHz Cortex-A720 cores
3 2.6 GHz Cortex-A720 cores
4 cores of 1.95 GHz Cortex-A520
8 MB system cache
1 2.8 GHz Cortex-X2 core
3 2.52 GHz Cortex-A710 cores
4 cores of 1.82 GHz Cortex-A510
Graphics
1095 MHz Xclipse 940 unit
1095 MHz Xclipse 940 unit
1306 MHz Xclipse 920 unit
Memory controller
4 16-bit channels
RAM 4200 MHz LPDDR5X
The bandwidth is 68.2 GB
4 16-bit channels
RAM 4200 MHz LPDDR5X
The bandwidth is 68.2 GB
4 16-bit channels
RAM 4200 MHz LPDDR5
The bandwidth is 51.2 GB
Record and play video
8K30 / 4K120 10-bit H.265
8K30 / 4K120 10-bit H.265
8K30 / 4K120 10-bit H.265
Wireless connection
Bluetooth 5.4 and Wi-Fi 7
Bluetooth 5.4 and Wi-Fi 7
Bluetooth 5.2 and Wi-Fi 6
modem
Exynos 5300 modem
Download 9640 MB
Upload is 2550 megabits
Exynos 5300 modem
Download 9640 MB
Upload is 2550 megabits
Exynos 5300 modem
Download 7350 megabits
Upload is 3670 megabits
manufacturing process
Samsung 4 nanometers
Samsung 4 nanometers
Samsung 4 nanometers
The Exynos chip at the heart of the Galaxy S24 FE is not much different from the Exynos 2400 at the heart of the flagships of the S24 series, and only the frequency of its powerful core is 100 MHz lower. Samsung says that the change in the frequency of the high-powered Exynos 2400e core helps to improve the chip’s energy consumption, higher stability, and better temperature control in various applications.
Looking at the benchmark results in the table below, we can see that the 100 MHz reduction in processor frequency in the usual benchmarks does not have a noticeable effect on the performance of the S24 FE compared to the S24, and with a small difference, it appears weaker in single-core and multi-core processing. Due to the same graphics processor, the performance of the S24 FE in this section is not significantly different from its flagship family.
Galaxy S24 FE performance against the competition
Product/benchmark
chip
Speedometer 2.1
GeekBench 6
GFXBench
Web browsing experience
GPU computing power
CPU computing power
Game simulator
Vulkan/Metal
Single/Multi
Aztec Ruins
Onscreen/1440p
Vulkan/Metal
Galaxy S24 FE
Exynos 2400e
300
15539
2077
6268
101
71
Galaxy S23 FE
Exynos 2200
88
9874
1608
4014
55
38
Pixel 8a
Tensor G3
135
6404
1763
4384
60
40
Xiaomi 13T Pro
Density 9200 Plus
121
7197
1292
3591
75
62
Notingphone 2
Snapdragon 8+ Gen 1
155
6746
1415
3959
60
51
Galaxy S24
Exynos 2400
242
16233
2141
6618
107
68
The generation gap between the S24 FE chip and the S23 FE is so great that it makes any comparison irrelevant. The Exynos 2400e not only beats its predecessor in all areas, including CPU and GPU processing power but also beats almost all the pseudo-flagships of other brands that we have reviewed at Zoomit.
Samsung Fan Edition 2024 appears 30% better in single-core processing and 60% better in multi-core processing and graphic computing power than the previous generation. The game simulator test also shows that thanks to an 80% improvement in results, you can count on the S24 FE with more confidence in running games.
In playing a heavy game like Genshin Impact, you can have a better evaluation of the performance of the Samsung chip in heavy and more realistic scenarios. In the situation where we set the game’s graphic settings to the highest setting, the game ran for a short time with an average of 55 frames per second; But after a few minutes, as the device warmed up to about 43 degrees Celsius, the performance dropped to 40 frames per second. Sudden frame drops were another problem we experienced in Genshin Impact.
Zomit has used GameBench software to obtain game frame rates.
GPU performance stability testing can give a better view of how the chip handles performance and temperature during heavy and long usage. The Galaxy S24 FE starts the test with a nearly 1,000-point gap over its S24 sibling but ends the test with an equal score. Such conditions show that the performance drop of the 2400e chip in heavy use is less than that of the Exynos 2400, which somehow indicates better performance management.
In terms of temperature control, the situation is not very favorable for the Samsung chip, and considering that we do not have a very good memory of the temperature condition of the Exynos chips, we look at this category with more sensitivity. During the stress test, the temperature of the phone’s body rose up to 46 degrees Celsius, which cannot be described by the words “progress” or “improvement”.
The Galaxy S24 FE battery has increased by 200 mAh compared to last year’s model, and considering that this year the screen size has also increased, the result obtained in the battery charging test looks promising. Galaxy S24 FE can charge for about 13 hours in daily use; Which means 4 more hours of charging than the previous generation.
Galaxy S24 FE battery life versus the competition
Product/benchmark
Display
battery
Play video
Everyday use
Dimensions, resolution, and refresh rate
milliampere hour
minute: hour
minute: hour
Galaxy S24 FE
6.7 inches, 120 Hz
2412 x 1080 pixels
4700
—
13:04
Galaxy S23 FE
6.4 inches, 120 Hz
2340 x 1080
pixel
4500
—
9:11
Pixel 8a
6.1 inches, 120 Hz
2400 x 1080 pixels
4492
20:00
12:00
Xiaomi 13T Pro
6.67 inches, 120 Hz
2712 x 1220 pixels
5000
16:39
10:49
Notingphone 2
6.7 inches, 120 Hz
2412 x 1080 pixels
4700
25:50
14:58
Galaxy S24
6.2 inches, 120 Hz
2340 x 1080 pixels
4000
27:55
12:51
This year, Galaxy S24 FE will enter the market with only 8 GB of RAM, which is not a good capacity for a pseudo-flagship Android device due to the artificial intelligence fever. In addition to models with storage space of 128 and 256 GB, its 512 GB version is also available to users. The 128GB model uses UFS 3.1 storage.
Galaxy S24 FE storage speed compared to competitors
phone model
Sequential reading rate
Sequential write rate
Galaxy S24 FE
2426 MB UK
1514 MB UK
Galaxy S24 Ultra
2473 megabytes
1471 megabytes
We had the 256GB model for review. According to the results of the benchmarks, it seems that the storage space in the 256 and 512 GB models is of the UFS 4.0 type, which provides high speed in reading and writing data.
Galaxy S24 FE software: pseudo-flagship full of artificial intelligence
In addition to the flagship chip, the Galaxy S24 FE inherits many software features from the Galaxy S24 family. Since we’ve been hearing Samsung’s AI a lot since the S24 series was announced, we expected to see at least some of Samsung’s native AI capabilities in this year’s Fan Edition. Fortunately, in the software department, the Koreans have been especially kind to their flagship this year.
Galaxy S24 FE offers a complete package of Samsung’s artificial intelligence features in its software. Functions such as Sketch To Image, live translation of calls or Live Translate, photo editing with artificial intelligence, artificial intelligence assistant functions such as Chat Assist and Note Assist, and of course, the attractive and extremely practical Circle To Search function are also present in S24 FE.
Thanks to the powerful chip, the Galaxy S24 FE can process some language models of artificial intelligence without the need for an Internet connection and also access the offline version of Google’s artificial intelligence, Gemina Nano. Since we have already talked about Samsung’s AI capabilities in detail in our Galaxy S24 Ultra review article, we suggest you check out this article.
The Galaxy S24 FE runs Android 14 out of the box, and even if we leave out the AI capabilities, we’ll find some interesting features at its heart. For example, the S24 FE now uses the Super HDR feature; In the sense that the brightness of the display is adjusted based on the bright and dark points of the photo independently of the brightness of the device itself, and like HDR videos, you can watch the photos more vividly and brighter than before. In addition to the promise of 7 years of updating the operating system, the S24 FE has another feature in common with this year’s Samsung flagships, which can encourage any user to buy it.
Galaxy S24 FE camera: more or less better and sometimes worse
For the S24 FE camera, the Koreans have not only adopted their usual procedure of using a triple combination of wide, ultra-wide, and telephoto, but they have not touched the sensors and lenses except for the telephoto camera. As before, the same 50-megapixel sensor used in the previous generation as well as the S22 to S24 phones, and the 12-megapixel sensor of the S23 FE ultrawide camera are present in this phone.
The telephoto camera uses an 8-megapixel sensor made by OmniVision instead of SK Hynix, which is not particularly different in terms of sensor dimensions and pixels, and offers three times magnification; Therefore, except for the role of the image signal processor in processing photos, no other factor can be considered to be involved in the difference in the output of the images.
camera
Sensor
Lens
Color filter
capabilities
Wide camera (main)
50 megapixels
Dimensions 1/1.57 inches
1.0 µm pixels
Dual Pixel phase detection autofocus
23 mm
Aperture f/1.8
Optical stabilizer
Tetrapixel
4K@30/60/120fps video recording
8K@30fps
1080p@30/60/120/240fps
HDR10+
Ultrawide camera
12 megapixels
Dimensions 1/3 inch
1.12 µm pixels
13 mm
Aperture f/2.2
No anti-vibration
RGB Bayer
Telephoto camera
8 megapixels
Dimensions 1/4.4 inches
1.0 µm pixels
Phase detection autofocus
76 mm (3x magnification)
Aperture f/2.4
Optical stabilizer
RGB Bayer
selfie camera
10 megapixels
Dimensions 1/3 inch
1.22 µm pixels
25 mm
Aperture f/2.4
Image-gyroscopic electronic stabilizer
RGB Bayer
filming
4K@30/60fps
1080p@30/60fps
Samsung says that the S24 FE uses the ProVisual Engine to improve the quality of photos and remove noise and blurring caused by vibration, especially in the dark, and also prevents the loss of image quality in digital zooms. Also, thanks to the more powerful chip, it is now possible to record 8K videos at 30 fps and even 4K slow-motion video at 120 fps.
Galaxy S24 FE photo gallery in daylight
In general, the photos of S24 FE cameras are not much different from their previous generation, however, in some scenarios, this phone performs better, and in some other situations, the opposite of this happens.
In this photo taken with the main camera, the Galaxy S23 FE is better in terms of dynamic range, and the S24 FE failed to establish a better balance between the shadows and highlights, and as a result, the details of the bright spots were burned; On the other hand, the yellowness of S24 FE photos compared to its previous generation is clearly visible.
Wide picture S23 FE
Wide picture S24 FE
S23 FE wide photo crop
S24 FE wide photo crop
In the examples below, the S24 FE ultrawide camera has better color and surface and has more contrast. It also appears better than the S23 FE in terms of detail and dynamic range. If you pay attention to the corners of the image, the distortion caused by the wide viewing angle of the camera is less.
S23 FE ultrawide photo
S24 FE ultrawide photo
Ultrawide photo crop S23 FE
S24 FE ultrawide photo cropping
The telephoto cameras of both phones are very close in performance; However, in some conditions, the Galaxy S24 can record more colorful, glossy, and contrasty photos; However, it still shows weakness in terms of dynamic range compared to the Galaxy S23 FE and is unable to extract details from bright spots.
S23 FE telephoto photo
S24 FE telephoto photo
S23 FE telephoto crop
S24 FE telephoto crop
To compare the details, I took pictures of both phones in 50-megapixel mode, and as you can see, the S24 FE was able to extract more details from the image; But this work comes at the cost of increasing the noise and over-sharpening of the images and makes it out of the natural state.
50 megapixel S23 FE photo
50 megapixel S24 FE photo
S23 FE 50-megapixel photo crop
S24 FE 50-megapixel photo crop
In the portrait images, we can also see that although the S24 FE photo may look more desirable, the Galaxy S23 FE has clearly depicted the shadows and highlights better and has not manipulated the light reflection in order to capture the skin tone closer to reality and, as a result, a more natural photo. .
Portrait photo S23 FE
Portrait photo S24 FE
In the dark, since the Galaxy S24 FE can capture more stable photos, instead of increasing the sensitivity, it slows down the shutter speed in order to capture less noisy photos. For this reason, details in shadows and darker areas may not be as discernible as on the S23 FE; But instead, less processing is applied to the photo, and therefore a more realistic photo can be seen.
Wide picture S23 FE
S23 FE wide photo crop
S23 FE telephoto photo
Wide picture S24 FE
S24 FE wide photo crop
S24 FE telephoto photo
In terms of details, the S24 FE excels in all cameras; If you pay attention to the ultrawide photo, the texture of the subject is destroyed due to software processing in Samsung’s veteran fan edition, while the details are clearly visible in the new generation.
Photo gallery at night
The selfie camera of the S24 FE is the same 10 megapixels as before and has not made any difference; However, surprisingly, the previous generation captures more vivid and high-contrast photos; Of course, in terms of details, there is no particular difference between them.
A product still involved with constant challenges
The one-year absence of the Exynos gave a second chance to the Korean fan edition to experience a significant upgrade in terms of hardware; So, upgrading from S23 FE to this phone can be seen as more logical than upgrading from S21 FE to S23 FE. Longer battery life and seven-year software support, along with artificial intelligence capabilities, are two other important factors that bring the previous generations of the FE series to their knees against this phone.
If we refer to the question at the beginning of the article, we can see that the problem of performance loss and heat that have always plagued the FE series has not been solved this year, and even the weakened version of the Exynos 2400 could not help it. It seems that this issue will continue to be the killer of Korean flagships until Samsung gives up Exynos.
What is the role of the graphics processing unit (GPU) in computers and smart devices? What are its components and how is the image transferred to the screen?
What is a graphics processor? Everything you need to know about GPU
The graphics processing unit (GPU) is a specialized electronic circuit for managing and changing memory to accelerate the creation and display of output images on the monitor . The graphic processor consists of a number of basic graphics operators, which in their most basic state are used to draw rectangles, triangles, circles, and arcs, and are much faster than processors in creating images.
Table of contents
What is a graphics processor (GPU)?
3D image
Bitmapped graphics (BMP.)
Vector graphics
rendering
Graphics API
What is GDDR?
History of 3D graphics
How to produce 3D graphics
3D modeling
Layout and animation
rendering
Shading technique
Pixel shaders
Vertex shaders
Difference between GPU and CPU
Familiarity with GPU architecture
Tensor kernels
Ray tracing engine
What is GPGPU?
What is CUDA?
Advantages of CUDA kernels
Disadvantages of CUDA kernels
OpenCL; CUDA replacement
CUDA and OpenCL vs. OpenGL
OpenCL or CUDA
The most prominent brands
Intel
Nvidia
AMD
The difference between graphics processor and graphics card
Graphics card components
Video memory
printed circuit board
Display connectors
bridge
Graphic interface
Voltage regulator circuit
Cooling system
Types of graphics processor
iGPU
dGPU
Cloud GPU
eGPU
Mobile GPU
Types of mobile GPUs
Other applications of graphics processors
Video editing
3D graphics rendering
Learning the machine
Blockchain and digital currency mining
In fact, what we see on the screens is the output of the graphics processors that are used in many systems such as phones, computers, workstations, and game consoles.
What is a graphics processor (GPU)?
If we consider the central processing unit or processor (CPU) as the brain of the computer that manages all calculations and logical instructions, the graphics processing unit or graphics processor (GPU) can be considered as a unit for managing the visual and graphical output of calculations, instructions and information. Related to images, it is known that their parallel structure works more optimally than central processing units or processors for processing large blocks of data; In fact, GPU is considered a graphic interface for converting the calculations made by the processor into a form that is understandable for the user, and it can be safely said that any device that somehow displays graphic output is equipped with some kind of graphics processor.
The graphics processing unit in a computer can be embedded on the graphics card or on the motherboard, or come with the processor on an integrated chip (for example, AMD APUs). It is also possible to identify the graphics card model in Windows with the fastest method, just refer to the linked article and read it.
Integrated chips cannot produce such impressive graphics output, and their output will definitely not satisfy any gamer; To benefit from higher quality visual effects, a separate graphics card (we will learn more about the differences between graphics processor and graphics card) with capabilities beyond a simple graphics processor should be prepared. In the following, we will briefly familiarize ourselves with some basic concepts used in the discussion of graphics.
3D image
An image that has depth in addition to length and width is called a three-dimensional image, which conveys more concepts to the audience and has more information compared to two-dimensional images. For example, if you look at a triangle, you will only see three lines and three angles, but if you have a pyramid-shaped object, you will see a three-dimensional structure consisting of four triangles, five lines, and six angles.
Bitmapped graphics (BMP.)
Bitmapped graphics, or rasterized graphics, is a digital image in which each pixel is represented by a number of bits; This graphic is made by dividing the image into small squares or pixels, each of which contains information such as transparency and color control; Therefore, in raster graphics, each pixel corresponds to a calculated and predetermined value that can be specified with great precision.
The image resolution of raster graphics is dependent on the resolution of the image , which means that the scale of the images produced with this graphic cannot be increased without losing the appearance quality.
Vector graphics
A vector graphic (ai. or eps. or pdf. or svg. formats) is also an image that creates paths with start and endpoints. These routes are all based on mathematical expressions and consist of basic geometric shapes such as lines, polygons, and curves. The main advantage of using vector graphics instead of bitmapped graphics is their ability to scale without losing quality. The scale of the images produced with vector graphics can be easily increased, without loss of quality and as much as the capability of the device that renders them.
As mentioned, unlike vector graphics that are scaled to any size with the help of mathematical formulas, bitmapped graphics lose their quality by scaling . The pixels of a bitmapped graphic must be interpolated when upscaling, which blurs the image and must be resampled when downscaling, which causes loss of image data.
In general, vector graphics are best for creating works of art consisting of geometric shapes, such as logos or digital maps, typefaces, or graphic designs, while raster graphics deal more with real photos and images and are suitable for photographic images.
Vector graphics can be used to make banners or logos; Because with this method, images are displayed in both small and large dimensions with the same quality. One of the most popular programs used to view and create vector images is Adobe Illustrator.
Rendering
The process of producing 3D images from software based on computational models and displaying it as an output on a 2D screen is called rendering.
Graphics API
Software programming interface (Application Programming Interface) or API is a protocol for communication between different parts of computer programs and is considered an important tool for software interaction with graphic hardware; This protocol may be based on the web, operating system, data center, hardware or software libraries. Today, many tools and software have been developed for imaging and rendering of 3D models, and one of the important uses of graphic APIs is to make the process of imaging and rendering easier for developers. In fact, graphics APIs provide virtual access to some platforms for the developers of their graphics applications and their testing. In the following, we introduce some of the most well-known graphic APIs:
OpenGL (short for Open Graphics Library) is a library of various functions for drawing 3D images, which is a cross-platform standard and application programming interface (API) for 2D and 3D graphics and rendering, and a graphics accelerator in video games, design, virtual reality, and other applications. is considered This library has more than 250 different calling functions for drawing 3D images and is designed in two types: Microsoft (often in Windows or graphics card installation software) and Cosmo (for systems that do not have a graphics accelerator).
The OpenGL graphic interface was first designed by Silicon Graphics in 1991 and was released in 1992; The latest version of this API, OpenGL 4.6, was also introduced in July 2017.
A set of application programming interfaces (APIs) developed by Microsoft to enable the communication of instructions with audio and video hardware. Games equipped with DirectX have the ability to use multimedia features and graphics accelerators more efficiently and have improved overall performance.
When Microsoft was preparing to release Windows 95 in late 1994, Alex St. John, a Microsoft employee, researched the development of games compatible with MS-DOS. The programmers of these games often rejected the possibility of porting them to Windows 95 and considered it difficult to develop games for the Windows environment. For this purpose, a three-person team was formed and within four months, this team was able to develop the first set of application programming interfaces (API) called DirectX to solve this problem.
The first version of DirectX was released in September 1995 as the Windows Games SDK, replacing the Win32 DCI and WinG APIs for Windows 3.1. DirectX for Windows 95 and all subsequent versions of Microsoft Windows allowed them to host high-performance multimedia content.
Microsoft offered to John Carmack, the developer of Doom and Doom 2 games, to transfer these two games from MS-DOS to Windows 95 for free with DirectX, in order to increase the acceptance of DirectX by developers. also save the game. Carmack agreed, and the first version of the game, Doom 95, was released in August 1996 as the first game developed on DirectX. DirectX 2.0 became a part of Windows itself with the release of the next version of Windows 95 and Windows NT 4.0 in mid-1996.
Since at that time, Windows 95 was still in its infancy and there were few published games for it, Microsoft began to promote this programming interface extensively and during an event for the first time, Direct3D and DirectPlay were introduced in the online demo of the MechWarrior 2 multiplayer game. Did the DirectX development team face the challenge of testing each version of this programming interface for each set of computer hardware and software, as well as different graphics cards, sound cards, motherboards, processors, inputs, games, and other multimedia applications with each beta version? The final ones were tested and even tests were produced and distributed so that the hardware industry could check the compatibility of their new designs and driver versions with DirectX.
The latest version of DirectX, namely DirectX 12, was unveiled in 2014, and a year later, it was officially released along with Windows 10. This graphics API supports a special multiple adapter and allows the simultaneous use of multiple graphics on a system.
Before DirectX, Microsoft included OpenGL in its Windows NT platform, and now Direct3D was supposed to be an alternative to Microsoft-controlled OpenGL, which was initially focused on gaming. During this time, OpenGL was also developed and it better supported programming techniques for interactive multimedia programs such as games, but since OpenGL was supported by Microsoft’s DirectX team, it gradually withdrew from the competition.
Vulkan
Vulkan is a low-cost, cross-platform graphics API for graphics applications such as gaming and content creation. The distinguishing feature of this graphic API with DirectX and OpenGL is its ability to render 2D graphics and consume less power.
At first, many thought that Vulkan could be the next improved OpenGL and the continuation of its path, but the passage of time has shown that this prediction was not correct. The following table shows the performance differences of these two graphics APIs.
OpenGL
Vulkan
It has only one global state machine
It is object-based and lacks a global state
state is limited to only one content
The concept of all states is placed in the command buffer
Functions are only performed sequentially
It has multi-threaded programming capability
Memory and GPU synchronization are usually hidden
It is possible to control and manage synchronization and memory
Error checking is done continuously
Drivers do not perform error checking at runtime.
Instead, there is a validation layer for developers.
Mantle
The Mantle Graphics API is a low-cost interface for rendering 3D video games. It was first developed by AMD and video game developer DICE in 2013. The partnership was intended to compete with Direct3D and OpenGL on home computers, however, Mantle was officially discontinued in 2019 and replaced by the Vulkan graphics API. Mantle could optimally reduce the workload of the processor and eliminate the nodes created in the processing process.
Metal
Metal is Apple’s proprietary graphical interface written in C++ language and first used in iOS 8. Metal can be seen as a combination of OpenGL graphic interface and OpenCL framework, which was designed to simulate the graphic APIs of other platforms such as Vulkan and DirectX 12 for iOS, Mac and tvOS. In 2017, the second version of the Metal graphics API was released with support for macOS High Sierra, iOS 11 and tvOS 11 operating systems. Compared to the previous version, this version was more efficient and optimized.
What is GDDR?
The DDR memory in the graphics processing unit is called GDDR or GPU RAM . DDR (short for Double Data Rate) is an advanced version of Dynamic Simultaneous RAM (SDRAM) and uses the same frequencies as it. The difference between DDR and SDRAM is the number of times data is sent per cycle; DDR transfers data twice per cycle, doubling the memory speed, while SDRAM transfers signals only once per cycle. DDRs quickly became popular because, in addition to twice the transfer speed, they are cheaper than SDRAM and also consume less power than older SDRAM modules.
Related articles:
What is DDR5? Everything you need to know about the latest RAM standard [with video]
GDDR was introduced in 2006 for fast rendering on the graphics processor, compared to normal DDR, this memory has a higher frequency and less heat, and it is considered a replacement for VRAM and WRAM, which has been released for 6 generations and each generation is faster and more advanced than the generation is previous
GDDR5 is known as the previous generation of video RAM, and ten years have passed since the introduction of the last current GDDR standard (i.e. GDDR6); GDDR6 with a transfer speed of 16 GB/s (double GDDR5) and a read/write access of 32 bytes (equal to GDDR5) is used in Nvidia’s RTX30 series and AMD’s latest 6000 series graphics cards; GDDR versions do not numerically correspond to DDR, and GDDR5, like GDDR3 and GDDR4, are based on DDR3 technology, and GDDR6 is also based on DDR4 technology; In fact, it can be said that GDDR takes a relatively more independent path than DDR in terms of performance differences.
The main task of the graphics processor is to render images, however, to do this, it needs space to store the information needed to create the completed image, this graphics unit uses RAM (or random access memory) to store data; Data that includes information about each pixel associated with the image, as well as its color and location on the screen. A pixel can be defined as a physical point in a raster image that represents a dot matrix data structure of a rectangular grid of pixels. RAM can also hold completed images until it’s time to display them, which is called a frame buffer.
Before the development of graphics processors, the CPU was responsible for processing images to create output and render them; This would put a lot of pressure on the processors and slow down the system. In fact, the sparks of today’s 3D graphics were lit up with the further development of arcade games, gaming consoles, military, robotics, and space simulators, as well as medical imaging. Rendering and their applications as well as the way of naming were discussed.
In the following and before examining the history of the graphic processing unit, we will introduce the concepts that are relevant in this industry:
History of 3D graphics
The term GPU was first introduced in the 1970s as an abbreviation for Graphic Processor Unit and described a programmable processing unit that had an independent function from the central processing unit or the same processor and was responsible for setting and outputting graphics. ; Of course, at that time this term was not defined as it is today.
In 1981, IBM developed its two graphics cards for the first time, MDA (Monochrome Display Adapter) and CGA (Color Graphics Adapter). The MDA had four kilobytes of video memory and only supported text display; This graphic is no longer used today but may be found on some older systems.
CGA was also considered the first graphics for computers, which was equipped with only sixteen kilobytes of video memory and was capable of producing 16 colors with a resolution of 160 x 200 pixels. A year after this, Hercules Graphics developed the HGC graphics (Hercules Graphics Card) with 64 kilobytes of video memory, which was a combination of MDA with bitmapped graphics, to respond to IBM’s graphics cards.
In 1983, Intel entered the graphics card market with the introduction of the iSBX 275 Video Graphics Multimodule. This card could display eight colors with a resolution of 256 x 256. A year after this, IBM introduced PGC or Professional Graphic Controller and EGA or Enhanced Graphic Adapter graphics that displayed 16 colors with a resolution of 640 x 350 pixels.
The VGA or Video Graphics Array standard was introduced in 1987, this standard offered a resolution of 640 x 480 with 16 colors and up to 256 kilobytes of video memory. In the same year, ATI introduced its first VGA graphics card called ATI VGA Wonder; Some models of this graphics card were even equipped with a port for connecting a mouse. Until now, video cards had few memories, and processors transferred graphics processing to these video memories and after performing calculations and signal conversion, displayed them on the output device.
After the first 3D video games were released, it was no longer possible to process graphics inputs quickly on processors; In this situation, the basic concept of a graphic processing unit was formed. This concept was initially developed with the introduction of the graphics accelerator; The graphics accelerator was used to boost system performance, perform calculations and graphic processing, and lighten the workload of the processor, and had a significant impact on computer performance, especially intensive graphics processing. In 1992, Silicon Graphics released OpenGL, the first library of various functions for drawing 3D images.
The GPU evolved from the beginning as a complement to the CPU and to lighten the workload of the unit
4 years later, Voodoo introduced its first graphics card by a company called 3dfx. This graphics was called Voodoo1 and it required the installation of a 2D graphics card to render 3D graphics, and it quickly became popular among gamers.
In 1997, in response to Voodoo, Nvidia released the RIVA 128 graphics accelerator. Like the Voodoo1, the RIVA 128 allowed video card manufacturers to use graphics accelerators along with 2D graphics, but compared to the Voodoo1, it had weaker graphics rendering.
After the RIVA 128, 3dfx released the Voodoo2 graphics as a replacement for the Voodoo1. It was the first graphics card to support SLI, allowing two or more graphics to be connected to produce a single output. SLI or Scalable Link Interface is a brand name for an obsolete technology developed by Nvidia for parallel processing and to increase graphics processing power.
The term GPU was popularized in 1999 with the worldwide launch of the GeForce 256 as the world’s first graphics processor by Nvidia. Nvidia introduced this GPU as a single-chip processor with integrated conversion of a 2D view from a 3D scene, lighting and changing the color of surfaces, and the ability to draw parts of the image after rendering. ATI Technologies also released Radeon 9700 graphics in 2002 to compete with Nvidia with the term Visual Processing Unit or VPU.
With the passage of time and the advancement of technology, GPUs were equipped with programmable capabilities, which made Nvidia and ATI enter the competition scene and introduce their first graphics processors (GeForce for Nvidia and Radeon for ATI).
Nvidia officially entered the graphics card market in 1999 with the release of GeForce 256 graphics. This graphics card is known to be the world’s first true GPU that had 32 MB of DDR (same as GDDR) memory and fully supported DirectX 7.
Along with the efforts to speed up the calculations and graphics processing of computers and improve their quality, the companies that produce video games and gaming consoles also tried in some way (Sega with Dreamcast, Sony with PS1, and Nintendo with Nintendo 64) in this field.
How to produce 3D graphics
The process of producing 3D graphics is divided into three main stages:
3D modeling
The process of developing an array based on mathematical coordinates of a physical surface or surface (inanimate or animate) in 3D form is done through specialized software by manipulating sides, vertices, and polygons that are simulated in 3D space.
Physical objects are represented using a set of points in three-dimensional space, which are connected by various geometric elements such as triangles, lines, curved surfaces, etc. Basically, 3D models are first created by connecting points and forming polygons. A polygon is an area that consists of at least three vertices (triangles) and the overall integrity of the model and its suitability for use in animation depends on the structure of these polygons.
Three-dimensional models (3D) are made from two methods of polygon modeling (Vertex) and by connecting grid lines of vectors or curve modeling (Pixel) by weighting each point; Today, due to greater flexibility and the possibility of faster rendering of the 3D modeling process in the first method, the vast majority of 3D models are produced in a polygonal and textured way. One of the main tasks of graphics cards is texture mapping (Texture Mapping), which adds texture to an image or 3D model. For example, adding a stone texture to a model makes it look like a real stone image, or adding a texture that resembles a human face to design a face for a scanned 3D model.
In the second method, modeling with weighted control of curved points is obtained, of course, the points are not interpolated, but only curved surfaces can be created using the relative increase of polygons. In this method, increasing the weight for a point brings the curve closer to that point.
Layout and animation
After modeling, it should be determined how to place and determine the movement of objects (models, lights, etc.) in a scene before rendering the objects and creating the image; This means that before the images are rendered, the objects must be designed and arranged in the scene. In fact, by defining the location and size of each object, the spatial relationship between the objects is formed. Motion or animation also refers to the temporal description of an object ( how it moves and changes shape over time ). Common layout and animation methods include keyframing, reverse kinematics, and motion capture. Of course, these techniques are often used in combination.
Rendering
In the last stage, based on the way the light is placed, the types of surfaces, and other specified factors, computer calculations are performed to produce and pay for the image. In this section, materials and textures are the data used for rendering.
The amount of light transmission from one surface to another and the amount of its distribution and interaction on surfaces are two basic actions in rendering that are often implemented using 3D graphics software. In fact, rendering is the final process of creating a 2D image or animation from a 3D model and a ready-made scene with the help of several different and often specialized methods that may take only a fraction of a second or sometimes up to several days for a single image/frame.
Shading technique
After the development of graphics processors to reduce the workload of processors and provide a platform for producing images with much more impressive quality than before, Nvidia and ATI gradually became the main players in the world of computer graphics. These two competitors worked hard to outdo each other and each tried to compete by increasing the number of levels in modeling and rendering and improving techniques. The shading technique can be seen as the birth of their competition.
In the computer graphics industry, shading refers to the process of changing the color of an object/surface/polygon in a 3D scene based on things like its distance from the light, its angle to the light, or the surface’s angle to the light.
Shaders calculate the appropriate levels of light, dark, and color while rendering a 3D scene.
Shading during the rendering process is done by a program called Shader, which calculates the appropriate levels of light, dark, and color during the rendering of a 3D scene. In fact, shaders have evolved to perform a variety of specialized functions in graphic effects, video post-processing, as well as general-purpose computing on GPUs.
Shader changes the color of surfaces in a 3D model based on the angle of the surface to the light source or light sources.
In the first image that you can see below, all the surfaces of the box are rendered with one color and only the edge lines are marked to make the image better visible.
The second image shows the same model without the edge lines; In this case, it is a bit difficult to recognize where one face of the box ends and then starts again.
In the third image, the shading technique has been applied; The final image looks more realistic and the surfaces are easier to recognize.
Shaders are widely used in cinema processing, computer graphics, and video games to produce a wide range of effects. Shaders are simple programs that describe a vertex or pixel. Vertex shaders describe properties such as position, texture coordinates, colors, etc. of each vertex, while pixel shaders describe the color, z-depth, and alpha value properties of each pixel.
There are three types of shaders in common use (pixel, vertex, and geometric shaders). Older graphics cards use separate processing units for each shader, but newer cards are equipped with integrated shaders that can run any technique and provide more optimized processing.
Pixel shaders
Pixel shaders calculate and render the color and other properties of each pixel region. The simplest types of pixel shaders produce only one screen pixel as the output color. In addition to simple lighting models, pixel shaders provide more complex outputs such as color space change, color saturation, brightness (HSL/HSV) or image contrast, blur generation, light bloom, volumetric lighting, normal mapping (for depth effect), bokeh, cell shading, They may also include posterization, bump mapping, distortion, blue screen or green screen effects, edge highlighting and motion, and simulating psychedelic effects.
Of course, in 3D graphics, the pixel shader alone cannot create complex effects, because it only works on one area and does not have access to the information about the vertices, but if the contents of the entire screen are passed to the shader as a texture, these shaders can use the screen and pixels Sample around and enable a wide range of 2D post-processing effects such as blur or edge detection/enhancement for shaders.
Vertex shaders
Vertex shaders are the most common type of 3D shaders and run once on each vertex given to the GPU. The purpose of using these shaders is to convert the three-dimensional position of each vertex in the virtual space into two-dimensional coordinates for display on the monitor. Vertex shaders can manipulate properties such as position coordinates, color, and texture, but cannot create new vertices.
Shaders needed parallelization to perform calculations and render quickly; The concept of crisp or thread was born from here
In 2000, ATI introduced the Radeon R100 series of graphics cards and with this work launched a lasting legacy of the Radeon series of graphics cards. The first Radeon graphics cards were fully compatible with DirectX 7 and used ATI’s HyperZ technology, which actually uses three technologies: Z compression, Z fast cleanup, and Z hierarchical buffering to conserve more bandwidth and improve rendering efficiency.
In 2001, Nvidia released the GeForce 3 graphics card series; This series was the first graphics card in the world to have programmable pixel shaders. Five years after this incident, ATI was bought by AMD, and since then the Radeon series of graphics cards has been sold under the AMD brand. Shader programs needed parallelization for fast calculations and renderings. To solve this problem, Nvidia proposed the concept of crisp or the same thread for graphics processors, which we will explain more about in the following.
Difference between GPU and CPU
The GPU evolved from the beginning as a complement to the CPU and to lighten the workload of the unit. Today, the performance of processors is becoming more powerful with new achievements in their architecture, increasing the frequency and number of cores, while GPUs are specifically developed to speed up graphics processing.
Processors are programmed in such a way that they can switch between operations very quickly in addition to doing one task with the lowest delay and highest speed. In fact, the processing method in CPUs is serial.
On the other hand, the graphics processor has been specifically developed to optimize the performance of graphics processing and provides the possibility of doing things simultaneously and in parallel. In the image below, you can see the number of cores of a processor and the number of cores of a graphics processor; This image shows that the main difference between CPU and GPU is the number of cores they have to process a task.
From the comparison of the overall architecture of the processors and graphics cards, we can find many similarities between these two units. Both use similar structures in the cache layers and both use a controller for memory and a main RAM. An overview of modern processor architecture suggests that low-latency memory access is the most important factor in processor design, with a focus on memory and cache layers (exact layout depends on vendor and processor model).
Each processor consists of several cache layers:
Level one cache memory (L1) is the fastest, smallest, and closest memory to the processor and stores the most important data needed for processing.
The next layer is the level two cache memory (L2) or the external cache memory, which is slower and larger than L1.
The L3 cache memory in the processor is shared by all cores, and in terms of capacity, it has a larger volume and lower speed than the L1 and L2 cache memory; Like L3, L4 cache has a larger volume and lower speed than L1 and L2; The two are usually used interchangeably. If the data is not located in the cache layers, it is called from the main RAM (DDR).
Looking at the general overview of the GPU architecture (the exact layout depends on the manufacturer and model), we can see that the nature of this unit is focused on running the available cores instead of quickly accessing the cache memory or reducing latency. In fact, the GPU consists of several groups of cores that are located in the level one cache memory.
Compared to the processor, the graphics processor has fewer layers of cache memory and less capacity, this unit is equipped with more transistors dedicated to calculations and cares less about data recovery from memory; The graphics processor is developed with the approach of doing parallel calculations.
High-performance computing is one of the effective and reliable uses of parallel processing to run advanced applications; Precisely for this reason, GPUs are suitable for this kind of calculations.
In simple terms, let’s say you have two options for doing some kind of heavy computation:
Using a small number of powerful cores that perform processes serially.
Using a high number of not-so-powerful cores that can perform several processes simultaneously.
In the first scenario, if we lose one of the cores, we will face a serious problem; The performance of the other two cores will be affected and the processing power will be greatly reduced, on the other hand, if we lose a core in the second scenario, there will be no noticeable change in the processing process and the rest of the cores will continue to work.
The GPU performs several tasks at the same time and the CPU performs one task at a very high speed
The bandwidth of the GPU is much higher than the bandwidth of the processor, and therefore it performs parallel processing with high volume much better. The most important issue about graphics processors is that this processing unit is developed for parallel processing, and if the algorithm or calculations are secret and do not have parallelization capabilities, they are not executed at all and cause the system to be slow. CPU cores are more powerful than GPU cores and the bandwidth of this unit is much less than GPU bandwidth.
Familiarity with GPU architecture
At first glance, the CPU has larger but fewer computing units than the GPU. Of course, keep in mind that a core in the processor works faster and smarter than a core in the GPU.
Over time, the frequency of processor cores has gradually increased to improve performance, and on the contrary, the frequency of GPU cores has been reduced to optimize consumption and accommodate installation in phones or other devices.
The ability to perform processes irregularly can be seen as proof of the intelligence of the processor cores. As mentioned, the central processing unit can execute the instructions in a different order than the one defined for it, or predict the instructions needed in the near future and prepare the operands to optimize the system as much as possible and save time, before execution.
In contrast, the core of a graphics processor is not responsible for complexity and does not do much for processing outside of instructions and programs. In general, the main specialization of GPU cores was to perform floating-point operations such as multiplying two numbers and adding a third number (A x B + C = Result) by rounding the result to an integer, which is called multiply-add or MAD for short. It uses the same result with full accuracy (without truncation) in the multiplication stage, which is called Fused Multiplay-Add or FMA.
The latest GPU microarchitectures today are no longer limited to FMA and perform more complex operations such as ray tracing or tensor kernel processing. Tensor cores and ray tracing cores are also designed to provide hyper-realistic renderings.
Tensor kernels
In 2020, Nvidia produced graphics processors equipped with additional cores that, in addition to shaders, were also used for artificial intelligence, deep learning, and neural network processing. These kernels are called Tensor. Tensor is a mathematical concept whose smallest imaginable unit has zero dimension (zero-by-zero structure) and contains only one value. By increasing the number of dimensions, other tensor structures are:
One-dimensional tensor: vector (Vector with zero-in-one structure)
Two-dimensional tensor: matrix (Matrix with one-in-one structure)
Tensor cores fall into the category of SIMD or “single instruction for multiple data” and their use in GPUs provides a much smarter chip than a calculator for graphics by providing all the computing and parallel processing needs. In 2017, Nvidia introduced graphics with a completely new architecture called Volta, which was designed and built targeting professional markets; This graphics card was equipped with cores for tensor calculations, but GeForce graphics processors did not use it.
At that time, the tensor cores were capable of multiplying decimal numbers up to 16-bit dimensions (FP16) and addition with 32-bit dimensions (FP32). Less than a year later, Nvidia introduced the Turing architecture; The only difference from the previous architecture was providing support for tensor cores for GeForce GPUs and data formats such as eight-bit integers.
In 2020, Ampere architecture was introduced in A100 graphics processors for data centers; In this architecture, the efficiency and power of cores have increased, the number of operations per cycle has quadrupled, and new data formats have been added to the supported set. Today, tensor cores are specialized and limited pieces of hardware that are used in a small number of consumer-specific graphics. Intel and AMD (the other two players in the world of computer graphics) do not have tensor cores in their GPUs, But maybe they will offer similar technology in the future.
Tensor kernels are widely used in physics engineering and mathematics: they can perform complex calculations in electromagnetism, astronomy, and fluid mechanics.
Tensor cores can increase the resolution of images: these cores extract images at a lower graphics level (or lower resolution) and increase the quality of images after rendering.
Tensor cores increase frame rate: Tensor cores can increase frame rate in games after enabling ray tracing in games.
Ray tracing engine
In addition to cores and cache layers, GPUs may also include hardware to accelerate ray tracing, which simulates a light source shining on objects and creates different zoning in terms of light radiation. Fast ray tracing in video games can display more realistic and high-quality images.
Ray tracing is one of the biggest advancements in recent years in computer graphics and the gaming industry. At first, this feature was only used in the film industry, computer image production, and in animation and visual effects, but today PS5 and XBOX X series gaming consoles also support ray tracing.
In the real world, everything we see is the result of light hitting objects and reflecting it to our eyes; Ray tracing does the same thing in reverse and by identifying the light sources, the path of the light rays, the material, the type of shadow and the amount of reflection when it hits the objects. The ray tracing algorithm displays the reflection of light from objects of different genders in different and more realistic forms, draws the shadow of the objects that are in the path of the light beam depending on whether they are transparent or semi-transparent, and follows the laws of physics. For this reason, the images produced with this feature are very close to reality.
Off beam tracking engine (right side) versus on beam tracking engine (left side)
Nvidia first released ray tracing in 2018 on RTX series graphics under the Turing architecture, and then introduced a new driver that added ray tracing support to some GTX series graphics, which performed less well than the GTX series. They have RTX.
AMD also introduced ray tracing to the PS5 and Xbox XS series consoles by introducing the RDNA 2 architecture. Activating this feature in games reduces the frame rate due to the heavy processing load; For example, if a game runs at 60 fps on a system in normal mode, it may provide only 30 fps with ray tracing.
Frame rate, measured in frames per second (FPS), is a good measure of GPU performance, indicating the number of completed images that can be displayed per second; For comparison, the human eye can process about 25 frames per second, however fast action games need to process at least 60 frames per second to render a game stream smoothly.
What is GPGPU?
Many users abused the ability of parallel and fast processing of graphics processors in some way and transferred processes with the possibility of parallel calculations to this unit without considering the traditional task of the graphics processor. GPGPU or general purpose graphics processor was the solution that Nvidia introduced to solve this problem.
GPGPU (abbreviation of General Purpose Graphics Processing Unit) is the graphics processing unit that also performs non-specialized calculations (or CPU tasks) .
In fact, GPGPUs are used to perform tasks that were previously performed by high-powered CPUs, such as physics calculations, encryption/decryption, scientific computing, and the generation of digital currencies such as Bitcoin. Since GPUs are built for massive parallelism, they can reduce the computational burden on the most powerful processors. That is, the same cores used to shade multiple pixels simultaneously can similarly process multiple data streams simultaneously. Of course, these cores are not as complex as processor cores.
The GeForce 3 was Nvidia’s first GPU to feature programmable shaders. At the time, programmers aimed to make rasterized or bitmapped 3D graphics more realistic, and this Nvidia GPU provided capabilities such as 3D transformation, roughness mapping, and lighting calculations.
After the GeForce 3, the ATI 9700 GPU was introduced, equipped with DirectX 9, with more programming capabilities like the CPUs. With the introduction of Windows Vista, along with DirectX 10, integrated shader cores became standard. This newly discovered capability of GPUs enabled more CPU-based computing.
Since the release of DirectX 10, which featured integrated shaders for Windows Vista, more focus has been placed on GPGPUs, and higher-level languages have been developed to facilitate programming for computations on GPUs. AMD and Nvidia both had approaches to GPGPU development with programming interfaces (open source OpenCL and Nvidia’s CUDA).
What is CUDA?
Simply put, CUDA allows programs to use the GPU as a sub-processor. The processor transfers certain tasks to the graphics equipped with the CUDA core, which is optimized to process and calculate things like lighting, motion, and interaction as quickly as possible, and even performs processing from multiple paths simultaneously when necessary. The processed data is then sent back to the processor, which uses it for larger and more important calculations.
Advantages of CUDA kernels
Computer systems are based on software, so most of the processing must be programmed in program code, and since the main function of CUDA lies in calculation, data generation, and image manipulation, using CUDA cores helps programmers save time processing effects, rendering, and Reduce outputs to a high degree, especially in changes of scales, as well as simulations such as fluid dynamics and forecasting processes. CUDA also works great in light sources and ray tracing, and functions like rendering effects, encoding, video conversion, etc. are processed much faster with its help.
CUDA is designed to work with programming languages such as C, C++, and Fortran, making it easier for experts in parallel programming to use the GPU. In contrast, previous APIs such as Direct3D and OpenGL required advanced skills in graphics programming.
This design is more efficient than CPUs for parallel processing of large blocks of data, as in the following examples:
Cryptographic hash functions
machine learning
Molecular dynamics simulation
Physics engines
Sorting algorithms
Programming skills
Disadvantages of CUDA kernels
CUDA is Nvidia’s proprietary approach to introducing a GPU-like graphics processor (GPGPU), so you should only use Nvidia’s products to take advantage of it. For example, if you have a Mac Pro, you cannot use the capabilities of CUDA kernels, because this device uses AMD graphics for graphics processing; Additionally, fewer applications support CUDA than its alternative.
OpenCL; CUDA replacement
OpenCL is a relatively new, text-based system that is considered a replacement for CUDA. Anyone can use the functionality of this standard in their hardware or software without paying for the technology or a proprietary license. CUDA uses the graphics as a co-processor, while OpenCL transfers data entirely and uses the graphics more as a discrete processor. This difference in the way graphics are used may not be accurately measured, but another measurable difference between these two standards can be seen as the difficulty of coding for OpenCL compared to CUDA; As a user, you are not tied to any vendor and support is so extensive that most apps don’t even mention accepting it.
CUDA and OpenCL vs. OpenGL
As mentioned before, OpenGL can be seen as the beginning of the story of this competition; Of course, the purpose of developing this programming interface is not to use graphics as a general-purpose processor, and instead it is simply used to draw pixels or vertices on the screen. OpenGL is a system that allows graphics to render 2D and 3D images much faster than a CPU. Just as CUDA and OpenCL are alternatives to each other, OpenGL is an alternative to systems like DirectX on Windows.
Simply put, OpenGL renders images very quickly, and OpenCL and CUDA handle the necessary calculations when videos interact with effects and other media; OpenGL may place content in the editing interface and render it, but when it comes to color correction for this content, CUDA or OpenCL will do the math to change the pixels. Both OpenCL and CUDA can use the OpenGL system, and a graphics-equipped system with the latest OpenGL support will always be faster than a computer with a CPU and integrated graphics.
OpenCL or CUDA
The main difference between CUDA and OpenCL is the specificity of the CUDA framework, which was created by Nvidia and is open source compared to OpenCL. Assuming that the software and system hardware support both options, it is recommended to use CUDA if you have Nvidia graphics; This standard works faster than OpenCL in most cases. In addition, Nvidia graphics also support OpenCL, although the productivity of AMD graphics is higher than OpenCL. The choice between CUDA or OpenCL depends on the needs of the individual, the type of work, the type of system the workload, and its performance.
For example, Adobe explains on its website that, with very few exceptions, everything that CUDA does for Premiere Pro can also be done by OpenCL. However, most users who have compared these two standards believe that CUDA is faster than Adobe products.
The most prominent brands
In the graphics processor market, AMD and Nvidia are well-known names. The former used to be ATI and originally launched under the Radeon brand name for its GPUs in 1985; Then Nvidia became known as ATI’s competitor with the release of its first graphics processor in 1999. AMD bought ATI in 2006 and now competes with Nvidia and Intel on two different fronts. In fact, personal taste and brand loyalty are the most important factors that differentiate AMD and Nvidia.
Nvidia recently released GTX 10 series graphics, but AMD’s equivalents are generally more affordable choices. Other competitors such as Intel are also in the game and implementing their graphics solutions on the chip, but currently, AMD and Nvidia can be identified as the most prominent brands in this field. The processing speed of Nvidia graphics is lower than AMD graphics. Nvidia graphics with more cores and higher frequencies are suitable for gaming, but since they have a lower cache, they cannot reach AMD processors for performing some parallel processing such as mining and extracting digital currencies. In the following, we will briefly introduce the three leading brands in the world of graphics and graphic architecture, and in the near future, we will examine these competitors, their architectures, and products in detail in a separate article.
Intel
Intel is one of the largest manufacturers of computer equipment in the world, which operates in the field of hardware production, various types of microprocessors, semiconductors, integrated circuits, processors, and graphics processors. AMD and Nvidia are two prominent competitors of Intel and each has its own fans. Intel’s first attempt at a dedicated graphics card was the Intel740, which was released in 1998, but failed due to poorer performance than market expectations, forcing Intel to stop developing discrete graphics products. However, this graphics technology survived in the Intel Extreme Graphics product line. After this failed attempt, Intel tried its luck in the world of graphics once again in 2009 with the Larrabee architecture. This time, the previously developed technology was used in the Xeon Phi architecture.
In April 2018, news broke that Intel was assembling a team to develop discrete graphics processing units aimed at both the data center and gaming markets, bringing in Raja Kodori, former head of AMD’s Radeon Technologies Group. Intel announced early that it plans to introduce a discrete GPU in 2020. The first Xe discrete GPU, codenamed DG1, was released in October 2019 as a test drive and was expected to be used as a GPGPU for data center and self-driving applications. This product was initially made with 10nm lithography and then in 2021 with 7nm lithography and used 3D stacking packaging technology (Intel’s Foveros molding).
Intel Xe, or Xe for short, is the name of Intel’s graphics architecture, which has been used in Intel processors since the 12th generation. This company has also started developing discrete graphics and desktop graphics cards based on the Xe architecture and the Arc Alchemist brand. Xe is a family of architectures, each of which has significant differences from the others and consists of Xe-LP, Xe-HP, Xe-HPC, and Xe-HPG microarchitectures.
Unlike previous Intel GPUs that used execution units (EU) as the computing unit, Xe-HPG and Xe-HPC use Xe cores. Xe cores have vector and matrix computing logic units, and they are called vector and matrix engines, in addition to these units, they are also equipped with L1 cache memory and other hardware.
Xe-LP (low power): Xe-LP is a low-power variant of the Xe architecture and is used as integrated graphics in 11th-generation Intel Core processors, Iris Xe MAX mobile discrete GPUs (codenamed DG1), and H3C XG310 server GPUs ( are used with the code name SG1). This series of graphics processors offers more processing frequency with the same voltage as the previous generation. In its largest configuration, Xe-LP has 50% more execution units (EU) than the 64 execution units of the 11th-generation graphics architecture in the Icelake series, and therefore its computing resources have been significantly increased. Along with the 50% increase in execution units, Intel has improved the architecture of Xe-LP graphics processors and instead of two calculation and logic units (ALU) with four paths in the previous generation, it uses eight paths for each calculation and logic unit. In addition, in the Xe LP architecture, a level 1 cache is also added, which reduces the delay in sending data, and supports end-to-end data compression, which increases bandwidth and performs tasks such as game streaming. It speeds up video chat recording, etc.
Xe-HP (High Performance): Xe-HP is a high-performance, datacenter graphics optimized for FP64 performance and multi-tile scalability.
Xe-HPC (High-Performance Computing): Xe-HPC is the high-performance computing variant of the Xe architecture. Each core in Xe-HPC includes 8 vector engines and 8 matrix engines, along with a large 512KB L1 cache.
Xe-HPG (High-Performance Graphics): Xe-HPG is a high-performance graphics variant of the Xe architecture that uses the Xe-LP-based microarchitecture with improvements to Xe-HP and Xe-HPC. Xe-HPG has always been focused on graphics performance and supports hardware-accelerated ray tracing, DisplayPort 2.0, neural network-based supersampling (XeSS) similar to Nvidia’s DLSS, and DirectX 12 Ultimate. Each Xe-HPG core contains 16 vector engines and 16 matrix engines.
Nvidia
Nvidia was founded in 1993 and is one of the main manufacturers of graphics cards and graphics processors (GPU). Nvidia produces different types of graphics units, each offering unique capabilities. In the following, we briefly introduce the micro-architectures of Nvidia graphics units and the improvements of each compared to the previous generation:
Kelvin: The Kelvin microarchitecture was released in 2001 and was used in the GPU of the original Xbox game console. GeForce 3 and GeForce 4 series graphics units were released with this microarchitecture.
Rankine: Nvidia introduced the Rankine microarchitecture in 2003 as an improved version of the Kelvin microarchitecture. This microarchitecture was used in the GeForce 5 graphics series. The video memory capacity in this microarchitecture was 256 MB and it supported vertex and fragment shading programs. Vertex shaders change the geometry of the scene and create a 3D layout. Fragment shaders also specify the color of each pixel in the rendering process.
Curie: Curie, a microarchitecture used in GeForce 6 and 7 series graphics, was released in 2004 as a successor to Rankine. The video memory capacity in Corey was 512 MB and it was the first generation of Nvidia graphics processors that supported PureVideo video decoding.
Tesla: The Tesla graphics microarchitecture was introduced in 2006 and made several significant changes to Nvidia’s GPU lineup. In addition to being used in GeForce 8, 9, 100, 200, and 300 series graphics units, the Tesla architecture is also used in Quadro graphics products for things other than graphics processing. In 2020, after Elon Musk introduced the Tesla electric car, Nvidia stopped using the Tesla name to avoid further confusion.
Fermi: Fermi was released in 2010 and offered features such as support for 512 CUDA cores, L1 cache /shared memory partitioning, 64KB capacity for RAM, and error correcting code (ECC) support. Some GeForce 8, GeForce 500, and GeForce 400 series graphics cards were produced based on this microarchitecture.
Kepler: Kepler’s graphical microarchitecture was introduced after Fermi in 2012 and came with key improvements over the previous generation. This micro-architecture was equipped with new execution cores with simultaneous processing capability (SMX) and supported TXAA (anti-aliasing method). In the anti-aliasing technique in computer video, the information of the past frames and the current frame are combined to remove the unevenness in the current frame, and each pixel is sampled once in each frame, but in each frame, the sample is located at a different location in the pixel. Pixels sampled in past frames are combined with pixels sampled in the current frame to create a better-quality image. The Kepler microarchitecture consumes less power and the number of CUDA cores in it has increased to 1536. This micro-architecture is capable of automatic overclocking by enhancing the graphics processor and is equipped with the GPUDirect feature, which enables the communication of graphics units without the need to access the processor. Nvidia has used this micro-architecture in some GeForce 600, GeForce 700, and GeForce 800M series graphics units.
Maxwell: Maxwell microarchitecture was released in 2014 and the first generation of GPUs based on this microarchitecture compared to Fermi, more efficient processors as a result of improvements related to logical partitioning control, reduction of dynamic power dissipation by eliminating frequency when the circuit is not in use. Scheduling of instructions and workload balancing, 64 KB of dedicated shared memory for each execution unit, improved performance with the help of native shared memory, and support for dynamic work parallelism. Some of the GeForce 700, GeForce 800M, GeForce 900, and Quadro Mxxx series graphics units were released with the Maxwell microarchitecture.
Pascal: Pascal replaced the Maxwell microarchitecture in 2016. Graphics based on this microarchitecture ( GeForce 10 series ) compared to the previous generation of improvements such as NVLink communication support, for higher speed than the PCIe interface, High Bandwidth 2 (HBM2) memory equal to 720 GB, preemption processing capability ) or rollback (which by creating a temporary interruption in a running process, another process with a higher priority) and active balancing are used to optimize the use of GPU resources.
Volta: Volta was a unique microarchitectural iteration released in 2017. Before Volta, most previous Nvidia GPU microarchitectures were developed for general use, but Volta GPUs were perfectly suited for professional applications; In addition, Tensor Cores were also used for the first time in this micro-architecture. As mentioned earlier, tensor cores are a new type of processing cores that perform specialized mathematical calculations and matrix operations and are specifically used in artificial intelligence and deep learning. Tesla V100, Tesla V100S, Titan V and Quadro GV100 series graphics units are developed based on the Volta microarchitecture.
Turing: The Turing microarchitecture was introduced in 2018 and, in addition to supporting tensor cores, it also featured a number of consumer-focused GPUs. Nvidia uses this microarchitecture in its Quadro RTX and GeForce RTX series graphics processors. Turing supports RTX (Real-Time Ray Tracing) and is used for heavy calculations such as virtual reality (VR). Nvidia has used this microarchitecture for its GeForce 16, GeForce 20, Quadro RTX, and Tesla T4 graphics units.
Ampere: Ampere is Nvidia’s newest microarchitecture, mostly used for high-performance computing (HPC) and artificial intelligence applications. The cores in this micro-architecture are tensor-type and support the third-generation NVLink interface, structural dispersion capability (turning unnecessary parameters to zero to enable the training of artificial intelligence models), second-generation ray tracing, MIG capability (abbreviation for Multi-Instance GPU) for active Separate partitioning and performance optimization of CUDA cores are equipped. Nvidia GeForce 30 series graphics units, workstations, and data centers are developed based on this microarchitecture.
In general, Turing may be considered Nvidia’s most popular microarchitecture, because Turing’s combined ray tracing and rendering capabilities create impressive 3D animations and realistic images that are very similar to reality. According to Nvidia, Real-Time Ray Tracing in graphics units based on Turing microarchitecture can calculate a billion rays per second to create graphics images.
AMD
AMD (abbreviation for Advanced Micro Devices) was founded in 1969 and now operates as a prominent competitor to Nvidia and Intel in the field of producing processors and graphics processors. After buying ATI in 2006, this company developed the products of this brand under its brand name. AMD graphics units are produced in several different series:
Radeon series: common and common series that are the same ATI souvenirs.
Mobility Radeon Series: Includes AMD’s low-power graphics, mostly used in laptops.
Fire Pro series: powerful AMD graphics designed for workstations.
Radeon Pro series: They are known as the new generation of Fire Pro graphics.
AMD graphics used to be numbered with four digits: Radeon HD 7750.
In these graphics, the bigger the model number, the stronger and newer the graphics. For example, HD 8770 graphics are stronger and more up-to-date than HD 8750; Of course, this does not apply to different generations. This means that graphics from the 7th generation cannot necessarily be compared with the graphics from the 8th generation without checking and only based on the generation number and consider it weaker. AMD changed the naming process of its products after the Radeon RX 5700 series graphics; RX 5700 XT and RX 5700 are the first graphics cards that were launched with AMD’s new naming scheme. AMD graphics units currently come in three general categories: the R5 series, the R7 series, and the R9 series.
R5 and R6 series: low-end and relatively weak AMD graphics.
R7 and R8 series: AMD’s mid-range graphics are suitable for editing in programs such as Photoshop and After Effects.
R9 series: AMD’s most powerful graphics belong to this family and provide acceptable performance for gaming; As far as some R9 series graphics are designed for virtual reality or VR devices.
Currently, the only active extension for AMD graphics units is the XT extension, which indicates the high-end, better performance and higher frequency of that product.
The difference between graphics processor and graphics card
Since the graphics processor is a specialized unit for processing and designing computer graphics and has been optimized for this purpose, it performs this task much more efficiently than a central processor. This chip is responsible for most in-game graphics calculations, image rendering, color management, etc., and advanced graphic techniques such as ray tracing or shading are defined for it; On the other hand, the graphics card is a physical and hardware part of a computer systems that has many electronic parts on it.
The production technology of graphic cards has been accompanied by many changes from the past to today, and two or three decades ago these parts were known as display cards or video cards. At that time, graphics cards did not have the sophistication of today, and the only thing they did was display images and video on the screen. With the increase in graphics capabilities and the support of cards for various hardware accelerators to provide different graphics techniques, the name of graphics card was gradually used for these parts.
Graphics card components
Today, these parts are more powerful than before and may provide the system required for different purposes such as gaming with different technologies. Therefore, all graphics cards do not have exactly the same components, but the key parts are the same. To use this piece, you will need to install the appropriate graphics driver for your graphics card on the system; This driver contains instructions on how to recognize and operate the graphics card and determines various specifications for running various games and programs. In addition to the graphics processor, the graphics card is equipped with other parts such as video memory, printed circuit board (PCB), connectors and cooling. You can get to know these parts in the next picture.
Video memory
Video memory is a place to store processed data, which is different from RAM or GDDR. This unit can be offered in different capacities depending on the use.
printed circuit board
A printed circuit board (PCB) is a board on which graphics card parts are placed and may consist of different layers. The material of these boards will be effective in the work quality of the graphic card.
Display connectors
After processing and performing calculations, the data needs cables and display connectors to be displayed on the screen. These cables use different types of connectors depending on the type of use of the product. For example, HDMI and DVI ports with more pins are used to display 4K resolution and very high frame rates, and VGA port is used to display images with lower resolutions; Today, most graphics cards have at least one HDMI port.
Bridge
For some high-end graphics cards, it is possible to use them together with other high-end graphics cards. Such a feature (Bridge) is a parallel processing algorithm for computer graphics that is used to increase processing power and is shown in Nvidia graphics cards with SLI (abbreviation of Scalable Link Interface) and in AMD graphics cards with the term Crossfire.
SLI was first used by 3dfx in the Voodoo2 graphics card line; After buying 3dfx, Nvidia acquired this technology but did not use it. In 2004, Nvidia re-introduced the name SLI and intended to use it in modern computer systems based on the PCIe bus; But using it in today’s modern systems required compatible motherboards.
Crossfire was also a technology introduced by ATI that allowed the simultaneous use of multiple graphics cards for the motherboard. Due to this technology, a controller chip was installed on the main board, which was responsible for controlling the intermediary channels and integrating their information for display on the screen; Officially, up to 4 graphics cards can be installed as Crossfire, in which case it is called Quad-Crossfire. This technology was first officially introduced in September 2005 to compete with SLI.
Graphic interface
The slot connecting the graphics card to the motherboard or support base was called APG in the past. After APG, another interface called PCI was introduced, and finally, what is known today as the graphics card and motherboard interface is PCIe or PCI Express, which plays the role of connecting, powering the board, and transferring information for the graphics card. In fact, PCI stands for Peripheral Component Interconnect and means peripheral component interface. In 2018, the PCI-SIG consortium published the general objectives of the sixth generation PCIe communication port; Two years later, this port was released in 2020, while its fourth generation has not yet become widespread, and graphics cards for normal use do not use the full capacity of PCIe 3.0.
The PCIe 6.0 port can transfer twice as much data as PCIe 5.0, i.e. 256 gigabytes per second of data through 16 paths, without the need to increase the bandwidth or more working frequencies and using current methods, and it is also compatible with its previous generations so that it can be used Available from older cards in new ports.
PCI Express bandwidth based on data transfer rate per direction (GB/second/direction)
Bandwidth slot
PCIe 1.0
(2003)
PCIe 2.0
(2007)
PCIe 3.0
(2010)
PCIe 4.0
(2017)
PCIe 5.0
(2019)
PCIe 6.0
(2022)
x1
0.25 GB/s
0.5 GB/s
1 gigabyte per second
2 gigabytes per second
4 gigabytes per second
8 gigabytes per second
x2
0.5 GB/s
1 gigabyte per second
2 gigabytes per second
4 gigabytes per second
8 gigabytes per second
16 gigabytes per second
x4
1 gigabyte per second
2 gigabytes per second
4 gigabytes per second
8 gigabytes per second
16 gigabytes per second
32 gigabytes per second
x8
2 gigabit per second
4 gigabytes per second
8 gigabytes per second
16 gigabytes per second
32 gigabytes per second
64 gigabytes per second
x16
4 gigabytes per second
8 gigabytes per second
16 gigabytes per second
32 gigabytes per second
64 gigabytes per second
128 gigabytes per second
Voltage regulator circuit
After the initial power supply by the PCIe interface, the current to the graphics card must be reviewed and adjusted. This task is the responsibility of the voltage regulator circuit (VRM), which provides the electric current required by various parts such as memory and graphics processors. The correct operation and application of regular, adequate, and timely voltages of this circuit can increase the durability of the graphics card and optimize energy consumption. Actually, the voltage regulator circuit decides how the power supply should be done. This circuit consists of four sections: input capacitor, MOSFET, choke, and output capacitor.
Input capacitors: current is entered and stored in the circuit through input capacitors and sent to other parts of the circuit when necessary.
MOSFETs: MOSFETs act like a bridge and pass the current stored in the input capacitors. There are two low-side and high-side MOSFETs in the voltage controller circuit; When the graphics need current, this current passes through the MOSFET High and when the graphics do not need the current, the current is stored in the MOSFET Low.
Chokes: Chokes are electronic components that reduce current noise as much as possible. Graphics need smooth and stable flow to function properly, and chokes provide this by removing noise.
Output capacitors: after filtering the current by the chokes and before sending the required current to the desired sections, the output capacitors remove the current from the circuit.
Cooling system
Every graphics card must be at an optimal temperature to perform at its best. The cooling system in the graphic card, in addition to reducing the working temperature of the product, increases the durability of the parts used in it. The system consists of two parts: a heatsink and a fan: the heatsink is usually made of copper or aluminum and is ideally passive. The main purpose of this section is to take heat from the graphics processor and distribute it to the surrounding environment; On the other hand, the fan is an active part of the graphics cooling system that blows air into the heatsink to keep it ready to remove heat. Some low-end graphics cards only have a heatsink, but almost all mid-range and high-end cards are equipped with a combination of a heatsink and a fan for proper and efficient cooling.
Types of graphics processor
Graphics processors in different types perform the task of performing calculations and graphics processing for systems; In the following, we will get to know the types of these graphic units, how they work, and the advantages and disadvantages of each:
iGPU
iGPU (abbreviation of Integrated Graphics Processing Unit) is an integrated graphics processing unit that is placed on the central processor chip or CPU. iGPU may be installed on the motherboard or placed next to the processor (in which case it is considered the same graphics processing unit in the integrated chip). These graphics units generally do not have much processing power and are not suitable for displaying advanced 3D game graphics and animations; Actually, they are designed for basic processing and it is not possible to upgrade them. The use of these graphics allows the system to be thinner and lighter, reducing power consumption and costs. AMD introduces its graphics processors as APUs.
Of course, today there are modern processors that can be surprisingly powerful with integrated graphics. Not all processors are equipped with an integrated GPU; For example, Intel desktop processors whose model number ends with F or the company’s X series processors do not have a graphics processing unit and therefore are sold at a lower price, and you will need a separate graphics card for graphics processing in systems equipped with this processor. had
Currently, AMD and Intel are trying to improve the performance of their integrated graphics processors, and Apple has also surprised many people with its silicon chips, especially in the M1 Max chip, which is a very powerful integrated graphics processor and can handle high-end graphics. to compete
dGPU
dGPU (abbreviation of Discrete Graphics Processing Unit) is a separate graphics processing unit that is used as a dedicated and separate chip in systems. A discrete GPU is usually much more powerful than an iGPU and makes it much easier to analyze large, sophisticated, 3D graphics data. In fact, to use gaming systems or advanced 3D rendering and design, having a powerful dGPU is essential. The discrete GPU can be easily replaced and upgraded; In addition to very high power, these units are also equipped with a dedicated cooling system and do not overheat when performing heavy graphics processing. It can be said that dGPUs are one of the reasons why gaming laptops are more expensive and heavier, with high power consumption and low battery life in these systems compared to normal laptops, for this reason, it is recommended only if you use your system for gaming to produce 3D graphics content. Or you use heavy tasks, buy a separate graphics processor.
Currently, the biggest names in the discrete GPU industry are AMD and Nvidia, although Intel has also recently launched its own laptop GPUs in the form of the Arc series, and plans to launch desktop graphics cards as well.
However, discrete GPUs require a dedicated cooling system to prevent overheating and maximize performance, which is why gaming laptops are much heavier than traditional laptops.
Cloud GPU
The cloud graphic processing unit provides the user with the possibility of using many graphic services on the Internet; That is, without providing a GPU, you can use the processing power of the graphics processor. Of course, it is understandable that cloud graphics do not offer that much special power, but it is suitable for those who do not have a large budget and do not need very advanced graphics processing. This group of people can pay different providers for the graphics cloud processing they receive based on their usage.
eGPU
An external graphics card or eGPU is a graphic that is placed outside the system, is equipped with a PCIe port and a power supply, and can be connected to the system externally through USB-C or Thunderbolt ports. Using these graphics allows the user to use powerful graphics in a compact and light system.
Apple Silicon M1
Mac computers equipped with the M1 chip do not support external graphics cards
In recent years, the use of external graphics has increased, and since the power of graphics processing and the quality of output images of laptops are generally lower than desktops, users have recently solved this problem by using external graphics. External or external graphics are mostly used for systems such as laptops, but some companies use these graphics units for older desktops with low processing power. Be careful that if it is possible to upgrade the laptop graphics, the use of external graphics will not be justified, but issues such as the large space required for external graphics, their high cost, etc., make users especially gamers use external graphics cards. Prefer to upgrade your desktop graphics system.
Mobile GPU
Mobile graphics determine our visual experience of phones and can even be decisive for some users (gamers) and show loyalty to a certain brand.
The mobile system on a chip (SOC) or in short the same chip that is in today’s phones, besides the central processing unit, has units for artificial intelligence processing, image signal for the camera, modem, and other important equipment, as well as a graphics processing unit. The mobile GPU was introduced to process heavy data such as 3D games and changed the world of phones, especially for gamers. As mentioned, the processing cores in the mobile graphics processor are less powerful than the processing cores in the central processors, but on the other hand, their simultaneous and fast performance makes it possible to display heavy content and complex graphics on phones.
Types of mobile GPUs
ARM is one of the main poles of the production of graphics processing units for phones and the owner of the famous Mali brand, Qualcomm has a large share of the phone graphics market with Adreno graphics processors, Imagination Technologies has been producing Power VR graphics processors for years. and Apple used this company’s graphics processors for a long time before developing its own graphics processor. It is interesting to know that unlike Apple, which has its own graphics processors, Samsung uses ARM or Qualcomm graphics processors for the processing and graphics calculations of its phones.
logo; Mali GPU: Mali mobile GPUs are developed by ARM and are sold in different price ranges. For example, the graphics processor used in the Galaxy S21 Ultra is Mali-G78 MP14 and can perform graphics processing at high speed and power.
Qualcomm; Adreno graphics processor: Along with the powerful Android processors it produces under the name of Snapdragon, Qualcomm also performs brilliantly in the production of mobile graphics processors. Like Mali GPUs, these units have a wide price range and target market. For example, the Adreno 660 GPU used in the Asus ROG Phone 5 gaming phone in 2021 was recognized as one of Qualcomm’s most powerful graphics.
Imagination Technologies; Power VR graphics processor: Power VR graphics processors were once used in the most popular iPhones, but Apple abandoned the use of these graphics units by producing its own graphics for the A Bionic chip; Today, Power VR GPUs are mostly used in affordable MediaTek chips and in budget and mid-range phones from brands like Motorola, Nokia, and Oppo.
Other applications of graphics processors
GPUs were originally developed as an evolved unit of graphics accelerators to help lighten the workload of processors. Until the last two decades, they were often recognized as accelerators for rendering 3D graphics, especially in games. However, since these units have high parallel processing power and can process more data than the central processing unit (CPU), they were gradually used in fields other than gaming, such as machine learning, digital currency mining, etc. In the following, we will learn about other uses of graphics processors other than gaming:
Video editing
Modern graphics cards are equipped with video encoding software and can prepare and format video data before playback. Video encoding is a time-consuming and complex process that takes a lot of time to complete with the help of the central processing unit. With their very fast parallel processing capabilities, GPUs can handle video encoding relatively quickly without overloading system resources. Note that high-resolution video encoding may take some time even with powerful GPUs, but if the GPU supports higher-resolution video formats, it will perform much better than the CPU for video editing.
3D graphics rendering
Although 3D graphics are most commonly used in video games and gaming, they are increasingly being used in other forms of media such as movies, television shows, advertisements, and digital art displays. Creating high-resolution 3D graphics, even with advanced hardware, just like video editing, can be an intensive and time-consuming process.
Modern film studios often depend on advanced GPU technology to produce realistic and dynamic computer graphics, making the hardware a vital part of the filmmaking process. Digital artists also use computers equipped with advanced graphics processors to create abstract works that cannot be produced in the usual physical space and produce works of art different from what we have seen so far. With the right combination of hardware performance and artistic vision, GPUs can be a powerful creative resource for computing and media content processing.
Learning the machine
One of the lesser-known applications of modern GPUs is machine learning. Machine learning is a form of data analysis that automatically builds analytical models. Basically, machine learning uses data to learn, identify patterns, and make decisions independent of human input, and due to the very intensive nature of this system and the need for parallel processing, GPUs can be considered an essential part of this technology.
Machine learning is considered the foundation of technology and the use of artificial intelligence, and for that reason it is a complex computational process that requires a large amount of data to be entered for analysis . Software known as machine learning algorithms performs and models the analysis based on what is called training data or sample data. These obtained models are used to make predictions or decisions without the need for human intervention. This method has been widely implemented in various fields, from the medical world to email filtering systems to prevent receiving inappropriate content, and has made machine learning a vital aspect of modern data infrastructures.
Blockchain and digital currency mining
One of the more common uses of GPUs besides gaming is their use in mining or digital currency extraction. In the process of mining digital currencies or cryptocurrencies, system resources are placed at the disposal of the blockchain (or a continuous record of complex encryption algorithms for storing transaction data); Each entry in this record is called a block, which requires a certain amount of computing power to produce. Although blockchain technology has applications outside of digital currencies, it is generally used to mine digital currencies (especially Bitcoin); Of course, the mining process can be different depending on the desired digital currency.
Specifically, the Bitcoin mining process involves allocating hardware resources to create blocks in the Bitcoin blockchain. The more blocks are added to the blockchain, the more bitcoins are generated. Such a process consumes system resources and power and reduces system productivity when engaged in mining. The high throughput and relatively low energy requirements of GPUs make these units a suitable tool for performing the mining process, which has recently gained a lot of fans.