In the software industry it is not uncommon to come across time related bugs or legacy software that has serious issues in its handling of time. Even when designing new features, especially in mobile software, it often causes headache to present time for a user that is potentially travelling around the world. The topic of time itself is huge, and it has bothered me that I haven't found a good article describing the relevant concepts of time from the software development perspective. This article is my personal take on things one should take into consideration when working with, or against, time.
When starting to work with date and time, people are often confused about what is so difficult about it and why. This is only natural, since regarding dates and times we usually work in a somewhat monocultural environment where people know from experience what is meant by dates and times. When applying the same concepts globally and historically things become much more difficult, so let’s look at some basic concepts of date and time and see what makes them complicated. Some of the examples are simplified in order to keep this article compact.
A system measuring astronomical cycles, such as years (revolutions of the earth around the sun), months (revolutions of the moon around the earth) and days (rotation of the earth on its axis).
One difficulty is that a month is actually somewhere between 29.18 and 29.93 days long depending on the location of earth on its elliptic orbit around the sun. Actually it’s even more complicated than that, because we can measure a full rotation in several ways, e.g. based on its location with respect to the stars or with respect to to the sun. In any case days in a month is a fractional number and difficult to calculate. Similarly there are about 12.36 months in a year, which is also an inconvenient number. The most common calendar in use is the Gregorian calendar, which is used in pretty much all international standards and forms the basis for many other calendars. However, other calendars, like the Islamic Hijri calendar which uses completely different date numbering, are also in official use.
In addition to the officially used calendars, it is quite common to use traditional calendars for different traditional holidays. For example the Orthodox church still continues to use the Julian calendar (astronomically less accurate predecessor of the Gregorian calendar) for their holidays. Also, holidays in China often fall on the new moon or the full moon, which means they still need to be determined by the traditional Chinese lunar calendar.
The year numbering system used by a certain calendar during a certain period.
This is purely cultural. There are several calendars currently in use that are identical in how they measure dates, months and years, except that they count the years differently. The Gregorian calendar counts years from the assumed birth year of Jesus of Nazareth, but traces back to the Roman calendar which used to count years from the founding of Rome. The current Gregorian calendar era is traditionally called anno Domini, as in year of the Lord, but here we use the alternative name Common Era (CE), as the era is now widely used in non-Christian context as well. As another example, the official calendar in Thailand since 1940 is a renumbered Gregorian calendar so that year 2015 CE is year 2558 BE (Buddhist Era), the buddhist era starting from the assumed year of the Buddha leaving his body and attaining nirvana.
Similarly Republic of China in Taiwan is using a Gregorian calendar where the first year is the year the republic was founded, so year 2015 CE is ROC (Republic of China) year 104. Coincidentally the ROC calendar is identical to the Juche calendar used in North Korea since 1997, because Kim Il-Sung was born on the same year as the Republic of China was founded. Japan has also migrated to the Gregorian calendar, but officially still keeps using the old Chinese practice of numbering years, which means restarting the era on every ascension of a new emperor. The current era in Japan is Heisei of emperor Akihito, and year 2015 CE is year 27 of Heisei.
A system dividing the rotation of the earth on its axis to smaller units that are easily and accurately measurable, mainly hours, minutes and seconds.
What makes time problematic is that the rotation of the earth is not constant and varies depending on the time of the year, due to the elliptic orbit of the earth. This has been solved by using the mean solar day in calculations, since it is almost constant between years. Almost. Events like tidal friction, earthquakes and movement of the earth’s crust affect the rotation.
Currently the variation in rotation is solved for scientific purposes by defining the SI second as periods of radiation of the caesium 133 atom. There are over four hundred clocks around the world counting time based on this definition, and together they define TAI (International Atomic Time), which is the basis of our current time. In addition to this, there is another definition of time called Universal Time, of which the UT1 version is the one that has remained in use. This definition is based on the mean solar time at 0° longitude and therefore drifts a bit from TAI depending on the rotation speed of the earth. Therefore, finally, a concept of UTC (Coordinated Universal Time) was created, which follows TAI, but adds or removes leap seconds to keep itself always within 0.9 seconds from the “real solar time” UT1.
The history of time in itself is extremely interesting, and I very much recommend visiting the Royal Observatory, Greenwich London to understand how the measurement of time has changed our society and saved countless lives. However, the introduction of UTC has more or less solved the problem of time itself in our daily lives and all the civil clocks in the world are now synchronised to it. Its slightly weird abbreviation is actually a compromise between CUT proposed by the English speakers and TUC (for “temps universel coordonné”) proposed by the French speakers. There is an ongoing debate of whether leap seconds should be removed to simplify calculations, but that would mean UTC would not follow the Universal Time any more.
A region that observes a uniform standard time, meaning all clocks in the region are synchronised to show the exact same time.
The problem time zones are trying to solve is that people are historically used to time their lives by using midnight and noon as reference points. Always observing the local solar time, as was still common in the early 19th century before railway networks were developed, is not practical, because people would have to resynchronise their clocks even on short travels. Also, it would have been easy to set all the clocks in the world to UTC (or its predecessor GMT, Greenwich Mean Time), but it would mean that e.g. midnight in Hawaii would be at 10 AM. The locals might very well get used to this, but it would be very difficult for everyone from other regions to realise what is early and what is late in Hawaii. Hence, time zones were invented as a compromise between these two.
Most time zones are simply offset from UTC by a whole number of hours, and even if the number of hours is not whole the concept is very clear. There are mainly two things that make time zones very painful to handle: Daylight Saving Time and constant change. Daylight Saving Time is very controversial and was first implemented by Germany and Austria-Hungary during World War I in order to save coal, but it has been widely adopted in the Western world since the 1970s. The idea is that the clocks are moved forward during the summer in order to have more daylight after the workday. However, it is not clear if this saves much energy in the modern world where heating and especially lighting is constantly developing at a fast pace, but what is clear is that it makes local time non-continuous and difficult to handle.
To name some difficulties, for example in the EU, the last Sunday of March does not have the time 3:30 AM at all and the last Sunday of October has the time 3:30 AM twice. What makes things more complicated is that different countries move to the Daylight Saving Time on a different date, and on the southern hemisphere it is actually from around October to April, which is opposite from the northern hemisphere. One cannot even rely on the midnight being 00:00 on a 24-hour clock, because in the Gaza Strip (the political status of which is very unclear, but is de facto governed by Hamas) they have decided to move to the Daylight Saving Time on the last Friday of March exactly at midnight, which means midnight on that day is actually at 1:00 AM. People have often combined a date with a zero time to construct a local midnight timestamp, but even that cannot be relied on without validation.
Another problem is the constant change, which means that the time zone rules are not stable. For example in only the last five years since 2010, the time zones in Russia have had three major changes that have changed both the boundaries and also the DST rules observed. In 2011 Samoa moved from UTC-11 to UTC+13, practically skipping a day. The People’s Republic of China during its founding abolished the time zones in 1949 to simplify things and since that year everyone has followed Beijing Time all over the country, but even this is not as simple as it sounds. They experimented with the daylight saving time from 1986 to 1991 and this still needs to be taken into account on events that happened on those years. Also the fixed time causes the western parts of China, mainly Xinjiang and Tibet, to have sunrises as late as 9:41 in the winter and sunsets as late as 21:50 in the summer. Therefore the Uyghur minority in China have established their own time zone and having a clock on a certain time can even be considered a political statement in the Xinjiang area.
There are numerous other unexpected changes as well, the most recent one being North Korea changing their time zone from UTC+9 to UTC+8.5 on August 2015, because they consider the UTC+9 zone introduced by Japan during Japanese colonisation of the Korean peninsula imperialistic. Recently Egypt and Morocco in turn have decided to observe Daylight Saving Time, but so that it is suspended for the Ramadan period, so they actually turn their clocks four times a year. Countries in general consider their time zone as part of their identity, which means political changes often result in time zone changes. Nevertheless, when working with time one can never escape history, because historical events still need to be placed on a timeline.
After understanding the basic concepts, there are some small details that should be taken into account when using computers to calculate time. The international standard for handling date and time related data is called ISO 8601, and the definitions used here are more or less in line with the definitions of ISO 8601.
The most common way of counting time instants with computers is Unix time, meaning seconds that have elapsed since 00:00:00 UTC, January 1st, 1970. In general when working with time in seconds one doesn’t have to worry about anything. Leap seconds are the only possible problem, because Unix time handles leap seconds by replaying the last second before leap second to keep in sync with the UTC. Modern frameworks and libraries usually use a variation called UTC-SLS (UTC with Smoothed Leap Seconds), which slows down or accelerates the clock 1000 seconds before the leap second to minimise its effect. Only applications that need extreme precision have to take leap seconds into account.
When talking about hours one has to pay a bit more attention to the presentation. The main thing to understand is that when we humans talk about an hour, it pretty much always means 60 minutes. But because of time zone changes, 60 minutes from 2:30 AM can be 3:30 AM or 4:30 AM or something completely different, depending on the time zone and the date used. Therefore, always when working with time one should realise that a time instant cannot be presented in local time used by humans without also knowing the date and the time zone used.
It is very much recommended to do date handling using the Gregorian calendar, since it is the basis of ISO 8601 interchange format for dates and times. The important thing to pay attention to is the slightly unintuitive arithmetics caused by variable length months. For example date 2015-01-31 + 2 months results in date 2015-03-31, but depending a bit on the implementation date 2015-01-31 + 1 month + 1 month results in date 2015-03-28. This is naturally because February does not have 31 days and it gets rounded down to the last day of February which is the 28th. Same thing happens with years when doing 2012-02-29 + 1 year, which results in 2013-02-28 because of the leap year. But unlike with time zones, people are usually already used to these anomalies. The financial markets use 360-day calendar to avoid these problems, but that’s another topic.
Date on its own is mostly useful for recording things like birthdays or independence days, which people celebrate on the same day regardless of where they are. Dates can also be combined with time in order to create a local date and time combination, but without knowing the exact time zone used it is neither possible to validate if that date and time combination exists nor is it possible to convert it to UTC timestamp for chronological comparison. Converting a date from one calendar to another should always be safe and very useful in doing localisation to different regions. And last but not least, one should never assume that a day is 24 hours or any number of minutes or seconds. It might be, sometimes even without exception, but it should not be assumed.
Time zones are quite well described in the earlier paragraphs, but there is one very common misunderstanding about them that needs to be corrected: many seem to think that by combining a UTC timestamp with the local time zone UTC offset as in “2015-09-01T08:00:00+03:00” is enough to handle the time zone problem completely. This is true if one only cares about the exact time when an event happened (a timestamp) and the local time when it happened. But if we want to do any operations on that time, like add hours or days to it, we need to know the actual time zone where the event happened to keep the local time representation in sync. In this particular case the IANA (Internet Assigned Numbers Authority) time zone name would be “Europe/Helsinki”, and with that information we can add days or hours to it without risk of losing track of local time.
Because the time zone rules can be extremely complicated, there is a quite comprehensive database called IANA time zone database, which is used by almost all operating systems and platforms. It is sometimes also called the Olson database by its original author Arthur David Olson. It does not even try to maintain exact time zone information for zones that have ceased to exist before 1970, but for zones that are currently still in use it has quite comprehensive history dating all the way back to the 19th century. The database is maintained by the community and if anyone finds inaccuracies in it, they are free to send an email to the maintainers with an explanation and proof of the required change, and it will most likely be included in the database.
Instant means an absolute instant of time, which usually is represented by Unix time or a date of Gregorian calendar combined with UTC time. Instant can always be calculated from local date, local time and time zone. Instant can always be converted to local date and/or local time if the local time zone is known.
Duration can be any length of time that one would conveniently measure with a stopwatch, including seconds, minutes and hours. A stopwatch doesn't have information about leap seconds or DST rules, and neither does duration.
Period means a calendar period between two dates, for example one month or two weeks. Practically period is almost the same as duration, but it is often handled separately to avoid confusion. For example a duration of one day is always 24 hours, but a period of one day could be 23 hours or 24 hours or something else depending on DST rules.
Interval means a duration between two known instants. It can be constructed from two instants, or alternatively from an instant and a duration, in which case the other instant is calculated using the duration. Interval is defined in ISO 8601, but not very often used and therefore for example Java 8 date and time API does not include support for it at all. Interval is most useful when wanting to calculate if two known durations with a starting time overlap or not.
I have tried to sum up the most important things I would like to know if I had to work with date and time handling for the first time in my life. I genuinely hope this is useful, and works as a short reference when dealing with the problems above. I also hope to have time to write a bit more practical real world examples of how date and time should be handled in computer programs, but that is saved for another blog post. If there are any mistakes in the content, I will correct them as soon as possible.