What Is The Difference Between Float And Double? - Stack Overflow
Có thể bạn quan tâm
-
- Home
- Questions
- Tags
- Users
- Companies
- Labs
- Jobs
- Discussions
- Collectives
-
Communities for your favorite technologies. Explore all Collectives
- Teams
Ask questions, find answers and collaborate at work with Stack Overflow for Teams.
Try Teams for free Explore Teams - Teams
-
Ask questions, find answers and collaborate at work with Stack Overflow for Teams. Explore Teams
Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about CollectivesTeams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about TeamsGet early access and see previews of new features.
Learn more about Labs What is the difference between float and double? Ask Question Asked 14 years, 9 months ago Modified 11 months ago Viewed 1.2m times 543I've read about the difference between double precision and single precision. However, in most cases, float and double seem to be interchangeable, i.e. using one or the other does not seem to affect the results. Is this really the case? When are floats and doubles interchangeable? What are the differences between them?
Share Improve this question Follow edited Dec 31, 2021 at 9:51 TheMaster 50k7 gold badges69 silver badges97 bronze badges asked Mar 5, 2010 at 12:48 VaioIsBornVaioIsBorn 7,9039 gold badges32 silver badges29 bronze badges 0 Add a comment |14 Answers
Sorted by: Reset to default Highest score (default) Trending (recent votes count more) Date modified (newest first) Date created (oldest first) 644Huge difference.
As the name implies, a double has 2x the precision of float[1]. In general a double has 15 decimal digits of precision, while float has 7.
Here's how the number of digits are calculated:
double has 52 mantissa bits + 1 hidden bit: log(253)÷log(10) = 15.95 digits
float has 23 mantissa bits + 1 hidden bit: log(224)÷log(10) = 7.22 digits
This precision loss could lead to greater truncation errors being accumulated when repeated calculations are done, e.g.
float a = 1.f / 81; float b = 0; for (int i = 0; i < 729; ++ i) b += a; printf("%.7g\n", b); // prints 9.000023while
double a = 1.0 / 81; double b = 0; for (int i = 0; i < 729; ++ i) b += a; printf("%.15g\n", b); // prints 8.99999999999996Also, the maximum value of float is about 3e38, but double is about 1.7e308, so using float can hit "infinity" (i.e. a special floating-point number) much more easily than double for something simple, e.g. computing the factorial of 60.
During testing, maybe a few test cases contain these huge numbers, which may cause your programs to fail if you use floats.
Of course, sometimes, even double isn't accurate enough, hence we sometimes have long double[1] (the above example gives 9.000000000000000066 on Mac), but all floating point types suffer from round-off errors, so if precision is very important (e.g. money processing) you should use int or a fraction class.
Furthermore, don't use += to sum lots of floating point numbers, as the errors accumulate quickly. If you're using Python, use fsum. Otherwise, try to implement the Kahan summation algorithm.
[1]: The C and C++ standards do not specify the representation of float, double and long double. It is possible that all three are implemented as IEEE double-precision. Nevertheless, for most architectures (gcc, MSVC; x86, x64, ARM) float is indeed a IEEE single-precision floating point number (binary32), and double is a IEEE double-precision floating point number (binary64).
Share Improve this answer Follow edited Jun 20, 2020 at 9:12 CommunityBot 11 silver badge answered Mar 5, 2010 at 13:06 kennytmkennytm 522k109 gold badges1.1k silver badges1k bronze badges 7- 20 The usual advice for summation is to sort your floating point numbers by magnitude (smallest first) before summing. – R.. GitHub STOP HELPING ICE Commented Aug 6, 2010 at 9:49
- 2 Note that while C/C++ float and double are nearly always IEEE single and double precision respectively C/C++ long double is far more variable depending on your CPU, compiler and OS. Sometimes it's the same as double, sometimes it's some system-specific extended format, Sometimes it's IEEE quad precision. – plugwash Commented Feb 8, 2019 at 5:27
- @R..GitHubSTOPHELPINGICE: why? Could you explain? – Sreeraj Chundayil Commented Jan 2, 2020 at 7:27
- 2 @InQusitive: Consider for example an array consisting of the value 2^24 followed by 2^24 repetitions of the value 1. Summing in order produces 2^24. Reversing produces 2^25. Of course you can make examples (e.g. make it 2^25 repetitions of 1) where any order ends up being catastrophically wrong with a single accumulator but smallest-magnitude-first is the best among such. To do better you need some kind of tree. – R.. GitHub STOP HELPING ICE Commented Jan 2, 2020 at 15:18
- 3 @R..GitHubSTOPHELPINGICE: summing is even more tricky if the array contains both positive and negative numbers. – chqrlie Commented Sep 7, 2020 at 8:59
Here is what the standard C99 (ISO-IEC 9899 6.2.5 §10) or C++2003 (ISO-IEC 14882-2003 3.1.9 §8) standards say:
There are three floating point types: float, double, and long double. The type double provides at least as much precision as float, and the type long double provides at least as much precision as double. The set of values of the type float is a subset of the set of values of the type double; the set of values of the type double is a subset of the set of values of the type long double.
The C++ standard adds:
The value representation of floating-point types is implementation-defined.
I would suggest having a look at the excellent What Every Computer Scientist Should Know About Floating-Point Arithmetic that covers the IEEE floating-point standard in depth. You'll learn about the representation details and you'll realize there is a tradeoff between magnitude and precision. The precision of the floating point representation increases as the magnitude decreases, hence floating point numbers between -1 and 1 are those with the most precision.
Share Improve this answer Follow edited May 24, 2021 at 21:13 answered Mar 5, 2010 at 12:54 Gregory PakoszGregory Pakosz 70.1k20 gold badges142 silver badges165 bronze badges 0 Add a comment | 32Given a quadratic equation: x2 − 4.0000000 x + 3.9999999 = 0, the exact roots to 10 significant digits are, r1 = 2.000316228 and r2 = 1.999683772.
Using float and double, we can write a test program:
#include <stdio.h> #include <math.h> void dbl_solve(double a, double b, double c) { double d = b*b - 4.0*a*c; double sd = sqrt(d); double r1 = (-b + sd) / (2.0*a); double r2 = (-b - sd) / (2.0*a); printf("%.5f\t%.5f\n", r1, r2); } void flt_solve(float a, float b, float c) { float d = b*b - 4.0f*a*c; float sd = sqrtf(d); float r1 = (-b + sd) / (2.0f*a); float r2 = (-b - sd) / (2.0f*a); printf("%.5f\t%.5f\n", r1, r2); } int main(void) { float fa = 1.0f; float fb = -4.0000000f; float fc = 3.9999999f; double da = 1.0; double db = -4.0000000; double dc = 3.9999999; flt_solve(fa, fb, fc); dbl_solve(da, db, dc); return 0; }Running the program gives me:
2.00000 2.00000 2.00032 1.99968Note that the numbers aren't large, but still you get cancellation effects using float.
(In fact, the above is not the best way of solving quadratic equations using either single- or double-precision floating-point numbers, but the answer remains unchanged even if one uses a more stable method.)
Share Improve this answer Follow edited Aug 2, 2023 at 6:57 remcycles 1,50315 silver badges18 bronze badges answered Mar 5, 2010 at 17:57 Alok SinghalAlok Singhal 95.9k21 gold badges128 silver badges158 bronze badges Add a comment | 19- A double is 64 and single precision (float) is 32 bits.
- The double has a bigger mantissa (the integer bits of the real number).
- Any inaccuracies will be smaller in the double.
I just ran into a error that took me forever to figure out and potentially can give you a good example of float precision.
#include <iostream> #include <iomanip> int main(){ for(float t=0;t<1;t+=0.01){ std::cout << std::fixed << std::setprecision(6) << t << std::endl; } }The output is
0.000000 0.010000 0.020000 0.030000 0.040000 0.050000 0.060000 0.070000 0.080000 0.090000 0.100000 0.110000 0.120000 0.130000 0.140000 0.150000 0.160000 0.170000 0.180000 0.190000 0.200000 0.210000 0.220000 0.230000 0.240000 0.250000 0.260000 0.270000 0.280000 0.290000 0.300000 0.310000 0.320000 0.330000 0.340000 0.350000 0.360000 0.370000 0.380000 0.390000 0.400000 0.410000 0.420000 0.430000 0.440000 0.450000 0.460000 0.470000 0.480000 0.490000 0.500000 0.510000 0.520000 0.530000 0.540000 0.550000 0.560000 0.570000 0.580000 0.590000 0.600000 0.610000 0.620000 0.630000 0.640000 0.650000 0.660000 0.670000 0.680000 0.690000 0.700000 0.710000 0.720000 0.730000 0.740000 0.750000 0.760000 0.770000 0.780000 0.790000 0.800000 0.810000 0.820000 0.830000 0.839999 0.849999 0.859999 0.869999 0.879999 0.889999 0.899999 0.909999 0.919999 0.929999 0.939999 0.949999 0.959999 0.969999 0.979999 0.989999 0.999999As you can see after 0.83, the precision runs down significantly.
However, if I set up t as double, such an issue won't happen.
It took me five hours to realize this minor error, which ruined my program.
Share Improve this answer Follow edited Mar 10, 2018 at 11:06 nbro 16k34 gold badges119 silver badges212 bronze badges answered Oct 20, 2015 at 6:51 Elliscope FangElliscope Fang 3512 gold badges4 silver badges9 bronze badges 2- 5 just to be sure: the solution of your issue should be to use an int preferably ? If you want to iterate 100 times, you should count with an int rather than using a double – BlueTrin Commented Sep 19, 2016 at 12:07
- 10 Using double is not a good solution here. You use int to count and do an internal multiplication to get your floating-point value. – Richard Commented Sep 24, 2017 at 23:10
There are three floating point types:
- float
- double
- long double
A simple Venn diagram will explain about: The set of values of the types
Share Improve this answer Follow answered Sep 7, 2020 at 8:48 Anushil KumarAnushil Kumar 6528 silver badges10 bronze badges Add a comment | 12The size of the numbers involved in the float-point calculations is not the most relevant thing. It's the calculation that is being performed that is relevant.
In essence, if you're performing a calculation and the result is an irrational number or recurring decimal, then there will be rounding errors when that number is squashed into the finite size data structure you're using. Since double is twice the size of float then the rounding error will be a lot smaller.
The tests may specifically use numbers which would cause this kind of error and therefore tested that you'd used the appropriate type in your code.
Share Improve this answer Follow edited Mar 10, 2018 at 11:05 nbro 16k34 gold badges119 silver badges212 bronze badges answered Mar 5, 2010 at 13:05 DolbzDolbz 2,1061 gold badge16 silver badges26 bronze badges Add a comment | 10Type float, 32 bits long, has a precision of 7 digits. While it may store values with very large or very small range (+/- 3.4 * 10^38 or * 10^-38), it has only 7 significant digits.
Type double, 64 bits long, has a bigger range (*10^+/-308) and 15 digits precision.
Type long double is nominally 80 bits, though a given compiler/OS pairing may store it as 12-16 bytes for alignment purposes. The long double has an exponent that just ridiculously huge and should have 19 digits precision. Microsoft, in their infinite wisdom, limits long double to 8 bytes, the same as plain double.
Generally speaking, just use type double when you need a floating point value/variable. Literal floating point values used in expressions will be treated as doubles by default, and most of the math functions that return floating point values return doubles. You'll save yourself many headaches and typecastings if you just use double.
Share Improve this answer Follow edited Nov 17, 2017 at 23:29 Peter Mortensen 31.6k22 gold badges109 silver badges133 bronze badges answered Mar 8, 2011 at 5:13 Zain AliZain Ali 15.9k14 gold badges97 silver badges109 bronze badges 1- Actually, for float it is between 7 and 8, 7.225 to be exact. – Peter Mortensen Commented Apr 12, 2013 at 20:25
Floats have less precision than doubles. Although you already know, read What WE Should Know About Floating-Point Arithmetic for better understanding.
Share Improve this answer Follow edited Dec 15, 2023 at 17:42 Charles Burns 10.5k7 gold badges66 silver badges83 bronze badges answered Mar 5, 2010 at 12:54 N 1.1N 1.1 12.5k6 gold badges45 silver badges62 bronze badges 1- For instance, all AVR doubles are floats (four-byte). – Peter Mortensen Commented Apr 12, 2013 at 20:22
When using floating point numbers you cannot trust that your local tests will be exactly the same as the tests that are done on the server side. The environment and the compiler are probably different on you local system and where the final tests are run. I have seen this problem many times before in some TopCoder competitions especially if you try to compare two floating point numbers.
Share Improve this answer Follow answered Mar 5, 2010 at 13:00 Tuomas PelkonenTuomas Pelkonen 7,8212 gold badges32 silver badges32 bronze badges Add a comment | 3The built-in comparison operations differ as in when you compare 2 numbers with floating point, the difference in data type (i.e. float or double) may result in different outcomes.
Share Improve this answer Follow edited Nov 5, 2012 at 1:35 mbinette 5,0843 gold badges26 silver badges32 bronze badges answered Dec 7, 2011 at 7:40 Johnathan LauJohnathan Lau 392 bronze badges Add a comment | 2Quantitatively, as other answers have pointed out, the difference is that type double has about twice the precision, and three times the range, as type float (depending on how you count).
But perhaps even more important is the qualitative difference. Type float has good precision, which will often be good enough for whatever you're doing. Type double, on the other hand, has excellent precision, which will almost always be good enough for whatever you're doing.
The upshot, which is not nearly as well known as it should be, is that you should almost always use type double. Unless you have some particularly special need, you should almost never use type float.
As everyone knows, "roundoff error" is often a problem when you're doing floating-point work. Roundoff error can be subtle, and difficult to track down, and difficult to fix. Most programmers don't have the time or expertise to track down and fix numerical errors in floating-point algorithms — because unfortunately, the details end up being different for every different algorithm. But type double has enough precision such that, much of the time, you don't have to worry. You'll get good results anyway. With type float, on the other hand, alarming-looking issues with roundoff crop up all the time.
And the thing that's not necessarily different between type float and double is execution speed. On most of today's general-purpose processors, arithmetic operations on type float and double take more or less exactly the same amount of time. Everything's done in parallel, so you don't pay a speed penalty for the greater range and precision of type double. That's why it's safe to make the recommendation that you should almost never use type float: Using double shouldn't cost you anything in speed, and it shouldn't cost you much in space, and it will almost definitely pay off handsomely in freedom from precision and roundoff error woes.
(With that said, though, one of the "special needs" where you may need type float is when you're doing embedded work on a microcontroller, or writing code that's optimized for a GPU. On those processors, type double can be significantly slower, or practically nonexistent, so in those cases programmers do typically choose type float for speed, and maybe pay for it in precision.)
Share Improve this answer Follow edited Aug 11, 2022 at 0:29 answered Feb 26, 2022 at 12:34 Steve SummitSteve Summit 47.7k8 gold badges78 silver badges109 bronze badges Add a comment | 1If one works with embedded processing, eventually the underlying hardware (e.g. FPGA or some specific processor / microcontroller model) will have float implemented optimally in hardware whereas double will use software routines. So if the precision of a float is enough to handle the needs, the program will execute some times faster with float then double. As noted on other answers, beware of accumulation errors.
Share Improve this answer Follow answered May 7, 2020 at 13:36 LissandroLissandro 715 bronze badges Add a comment | -2Unlike an int (whole number), a float have a decimal point, and so can a double. But the difference between the two is that a double is twice as detailed as a float, meaning that it can have double the amount of numbers after the decimal point.
Share Improve this answer Follow answered Sep 5, 2017 at 12:10 NykalNykal 1692 silver badges4 bronze badges 1- 6 It doesn't mean that at all. It actually means twice as many integral decimal digits, and it is more than double. The relationship between fractional digits and precision is not linear: it depends on the value: e.g. 0.5 is precise but 0.33333333333333333333 is not. – user207421 Commented Sep 24, 2017 at 23:34
Not the answer you're looking for? Browse other questions tagged
or ask your own question.- Featured on Meta
- We’re (finally!) going to the cloud!
- More network sites to see advertising test [updated with phase 2]
Linked
4 Comparison of float and double variables -1 C++ different output in double and float 0 Float and Double value creating confusion in c 0 What is the reason of difference in precision between double and long double 0 C++ - Difference between float and double? 0 Why the float values are different from double values when set precision? -8 What's different between a single precision and double precision floating values? -1 In Java, specifically Floating-point, what is the difference between float and double? 0 "C++ float vs. double Differences?" 3996 Is floating-point math broken? See more linked questionsRelated
106 Should I use double or float? 2 double precision C++ 1 Double versus float 161 'float' vs. 'double' precision 6 What's the difference between LONG float and double in C++? 1 Confused between double and float data types 0 Precision in double and other floating numbers 0 Double vs float precision issue 2 What does the precision of float, double or long double mean in C++? 1 Floating-point and ieee-754Hot Network Questions
- Can .zshrc be modified automatically by other programs, installers, etc.?
- Typesetting phantom contents in nicematrix
- What determines your “awareness” when it comes to being criminally responsible for murder?
- Why does water vapor sometimes start to twirl above a pot of boiling water?
- Why is the chi-square test giving unintuitive results?
- If the hard problem of consciousness is unanswerable, is it a hard problem or just a bad question?
- How to demystify why my degree took so long on my CV
- Using Revese Tunnel to for accessing URL not directly accessible
- When and how were nets and filters first shown to be equivalent?
- What geographical changes does Canada need to have a far larger carrying capacity?
- Can I use the Wish Spell to change my Class ( Wizard 18, Warlock 2 to Wizard 19, Warlock 1)?
- Irregularities in Moment of inertia of torus
- Converting Line Segments to Lines
- How much do ebikes actually benefit from ebike specific wheels, tires, and forks?
- Prospective employers tell me my field is obsolete. How can I reinvent myself?
- What does "Ganz wirklich ehrlich" mean in this context?
- Implicit function theorem without manifolds (Steve Smale article)?
- Does the 90 day window for VWP reset for extended stay in Mexico?
- Minimal Rules of Style for a Rough Draft
- Looking for short story about detectives investigating a murder in the future
- Formative alternative to midterms for a large class
- What happened to the lifeboats in Star Trek: First Contact?
- How do I report to Springer a scientific fraud to a cryptographic paper published in Springer?
- Have import tariffs ever been good for an economy historically?
To subscribe to this RSS feed, copy and paste this URL into your RSS reader.
defaultTừ khóa » Float Và Double Khác Nhau Thế Nào
-
Sự Khác Biệt Giữa Float Và Double
-
Sự Khác Biệt Giữa Float Và Double (với Biểu đồ So Sánh) - Công Nghệ
-
Sự Khác Biệt Giữa Float Và Double Là Gì? - HelpEx
-
Sự Khác Biệt Giữa Float Và Double (Công Nghệ) - Sawakinome
-
Sự Khác Biệt Giữa Float Và Double - Tôi Nên Sử Dụng Cái Nào?
-
Sự Khác Nhau Giữa Kiểu Dữ Liệu Float Và Double Trong C/C++ Là Gì?
-
Hỏi Về Sự Khác Nhau Giữa Float Và Double - Programming
-
Sự Khác Nhau Giữa Float Và Double 2022 - Chưa được Phân Loại
-
Các Kiểu Dữ Liệu Trong C ( Int - Float - Double - Char ...) - Freetuts
-
Bài 2: Biến Và Kiểu Dữ Liệu Trong C | Tìm ở đây
-
Sự Khác Biệt Giữa Float Và Double Trong Php? - Giá-xe-má
-
Khi Nào Bạn Sử Dụng Float Và Khi Nào Bạn Sử Dụng Gấp đôi
-
Tại Sao Bạn Sẽ Sử Dụng Float Over Double, Hoặc Double Over Long ...
-
Giá Trị Mặc định Của Kiểu Dữ Liệu Float Và Double Trong Java?