Fun with numbers
Introduction
- The algorithm for computing the check digit for Singapore identity card numbers is unpublished
- Algorithm is partially described in various open sources
- Objective of this exercise is to elucidate the complete algorithm from internet resources and “virtual experimentation”
UIN/FIN structure
- The National Registration Identity Card (NRIC) number is the Unique Identification Number (UIN) or Foreigner Identification Number (FIN)
- Century prefix
- S, T - 19th and 20th letters of alphabet for UINs issued in 19xx and 20xx respectively
- F, G - Foreigners (not 7th and 8th century !)
- Check digit (official reference)
- Computed from first eight characters of UIN/FIN
- Detects data entry errors
UIN/FIN algorithm
- Government will release UIN/FIN algorithm for computing check digit, BUT
- “Application is open ONLY to Singapore-based organisations with the legitimate need for the UIN/FIN validation.”
- “Your application is subject to our final approval and our decision shall be final”
- License agreement requires:
- “The Licensee agrees to take all reasonable steps to protect the Licensed Material from unauthorised copying, adaptation or use.”
- License fee
- Algorithm $200
- Sample code $400
IP AnalysisCan the government really prohibit unauthorised use ?
- Copyright
- Source code is subject to copyright
- Algorithms are not subject to copyright
- Patent
- Algorithms are patentable, but
- Patent must be published
- Prior art probably exists in this case
- Patent, if any is long expired (> 20 yrs)
- Trade Secret
- May be protectable under the license agreement
- BUT, no secret if the information is already publicly available or obtained via a different route
-
-
Modulo 11 checksum
- Algorithm for S-series (old-style) NRIC numbers is well-known*
d = [(d1 d2 d3 d4 d5 d6 d7) • (2 7 6 5 4 3 2 )] mod 11
- Does this work for F, G, T-prefix UIN/FINs ?
Reverse Engineering the FIN algorithm
- Find a large set of FINs then reverse engineer the check digits to determine weights and mapping of checksum to letters
- MOM publishes a list of Registered Safety Officers on its website
- 48 out of 1,287 Safety officers are foreigners with FINs
- By inspection, same algorithm and same weights are used but with different check letters:
21st century UINs - T & G prefix
- Difficult to obtain large list of T-and G-series UINs
- Children born and foreigners registered during or after 2000
- Solution: Use a brute force approach and rely on the National Library web interface to check accuracy of guess
Virtual ExperimentVerifying UIN/FIN check digits
21st century UIN/FIN check digit
- By exhaustive search, we conclude for T-prefix UINs
- Same weighting factors and modulo 11 algorithm is used but
- Mapping of check digits is shifted 4 places
- Similar shift is observed for G-prefix FINs
Universal UIN/FIN Check Digit Algorithm
- For any UIN/FIN of format
P d1d2d3d4d5d6d7 C where P = Century prefix {S, T, F or G}
di = Number, i = 1..7
C = Check Digit (letter)
References
- UIN algorithm described in chapter 3 of course notes for NUS Coding Theory course (http://www.math.nus.edu.sg/~ma3218)
- S & T prefix algorithm confirmed
- No known public references to F, G-prefix FIN algorithm
- Hong Kong Identity Card
- HKID uses numerical check digit, e.g. B255241(3)
- Check digit given by modulo 11 checksum with weights (8, 7, 6, 5, 4, 3, 2) where letter prefix is converted to number A=1, B=2, etc.
- Use X if remainder is 10
- International Standard Book Number (ISBN)
- ISBN is 9 digit number with check digit given by modulo 11 checksum
- Weights (1, 2, 3, 4, 5, 6, 7, 8, 9)
- Use X if remainder is 10
Points to Ponder
- Why modulo 11 ?
- For numerical check digit, using modulo 11 allows checksum to be written as single digit (10 = X)
- For alphabetic check digit, modulo 26 is more likely to detect errors
-
- Why weights (2, 7, 6, 5, 4, 3, 2) ?
- Is there an optimal weighting scheme (compare to HKID, ISBN weighting factors) ?
-
- Why ABCDEFGHIZJ for S-prefix UINs ?
- Will there be U-series UINs in 2200 ?