Linux password manipulation in Python

Steven Taschuk staschuk at telusplanet.net
Tue Mar 18 02:42:45 EST 2003
Quoth John Krukoff:
> You probably want to look at man 3 crypt. [...]

But not, note, the crypt(3) that comes in the man-pages package,
which describes only the DES hashing method; I presume you're
referring to one which comes with glibc.

> On my box (gentoo):
>     
>     # grep jkrukoff /etc/shadow
>     jkrukoff:$1$TR8v8QBY$/RuCh8wlK.aHczufkXFbZ/:12129:0:99999:7:::
> 
>     # python
>     >>> from crypt import crypt
>     >>> crypt( 'bob', '$1$TR8v8QBY$' )
>     '$1$TR8v8QBY$/RuCh8wlK.aHczufkXFbZ/'
>
> Where the salt begins with '$1$', is a maximum of eight characters and
> is optionally terminated by a '$'. Needless to say, this probably isn't
> very portable.

The important thing for the OP is, I think, that the salt for the
MD5 method includes everything up to the third '$', and not just
the first two characters (as in the DES method).

I think this '$1$...$' business is more portable than you think:
the glibc sources have comments which suggest that they're
duplicating an existing implementation from elsewhere; the shadow
suite seems to have taken it from FreeBSD.  It's not *just* a GNU
extension.  Certainly, if you look in /etc/passwd or /etc/shadow
and find a hash that starts with '$1$' (which is not permissible
with Unix crypt), then it's a good guess that it's an MD5-based
hash as above and that the local system's crypt() will do the
Right Thing if you separate out the salt properly.

In case anybody's interested, I include below a naïve
reimplementation in Python of glibc's md5_crypt.  It's an
entertainingly bizarre algorithm.

#!/usr/bin/env python
"""Reimplementation of MD5-based crypt(), based on glibc 2.2.4."""
import md5
import string

def md5_crypt(key, salt):
    assert salt.startswith('$1$')
    assert salt.endswith('$')
    salt = salt[3:-1]
    assert len(salt) <= 8

    hash = md5.new()
    hash.update(key)
    hash.update('$1$')
    hash.update(salt)

    second_hash = md5.new()
    second_hash.update(key)
    second_hash.update(salt)
    second_hash.update(key)
    second_hash = second_hash.digest()
    q, r = divmod(len(key), len(second_hash))
    second_hash = second_hash*q + second_hash[:r]
    assert len(second_hash) == len(key)
    hash.update(second_hash)
    del second_hash, q, r

    # Comment in glibc 2.2.4 source:

    # "The original implementation now does something weird: for every 1
    # bit in the key the first 0 is added to the buffer, for every 0
    # bit the first character of the key.  This does not seem to be
    # what was intended but we have to follow this to be compatible."

    # But this is *not* what their code does.  The code alternates
    # between '\0' and key[0] based *not* on the bits of the key,
    # but on the bits of the representation of the *length* of the
    # key.  Weirdness on top of weirdness.

    i = len(key)
    while i > 0:
        if i & 1:
            hash.update('\0')
        else:
            hash.update(key[0])
        i >>= 1

    hash = hash.digest()

    for i in xrange(1000):
        nth_hash = md5.new()
        if i % 2:
            nth_hash.update(key)
        else:
            nth_hash.update(hash)
        if i % 3:
            nth_hash.update(salt)
        if i % 7:
            nth_hash.update(key)
        if i % 2:
            nth_hash.update(hash)
        else:
            nth_hash.update(key)
        hash = nth_hash.digest()

    # a different base64 than the MIME one
    base64 = './0123456789' \
        'ABCDEFGHIJKLMNOPQRSTUVWXYZ' \
        'abcdefghijklmnopqrstuvwxyz'
    def b64_three_char(char2, char1, char0, n):
        byte2, byte1, byte0 = map(ord, [char2, char1, char0])
        w = (byte2 << 16) | (byte1 << 8) | byte0
        s = []
        for _ in range(n):
            s.append(base64[w & 0x3f])
            w >>= 6
        return s

    result = ['$1$', salt, '$']
    result.extend(b64_three_char(hash[0], hash[6], hash[12], 4))
    result.extend(b64_three_char(hash[1], hash[7], hash[13], 4))
    result.extend(b64_three_char(hash[2], hash[8], hash[14], 4))
    result.extend(b64_three_char(hash[3], hash[9], hash[15], 4))
    result.extend(b64_three_char(hash[4], hash[10], hash[5], 4))
    result.extend(b64_three_char('\0', '\0', hash[11], 2))

    return ''.join(result)

def _test():
    assert md5_crypt('bob', '$1$TR8v8QBY$') == \
        '$1$TR8v8QBY$/RuCh8wlK.aHczufkXFbZ/'

if __name__ == '__main__':
    _test()

-- 
Steven Taschuk             "The world will end if you get this wrong."
staschuk at telusplanet.net     -- "Typesetting Mathematics -- User's Guide",
                                 Brian Kernighan and Lorrinda Cherry





More information about the Python-list mailing list