Tutorial: searching LDAP entries

Note

A more pythonic LDAP: LDAP operations look clumsy and hard-to-use because they reflect the age-old idea that time-consuming operations should be done on the client in order not to clutter and hog the server with unneeded elaboration. ldap3 includes a fully functional Abstraction Layer that lets you interact with the DIT in a modern and pythonic way. With the Abstraction Layer you don’t need to directly issue any LDAP operation at all.

Finding entries

To find entries in the DIT you must use the Search operation. This operation has a number of parameters, but only two of them are mandatory:

search_base: the location in the DIT where the search will start
search_filter: a string that describes what you are searching for

Search filters are based on assertions and look odd when you’re unfamiliar with their syntax. One assertion is a bracketed expression that affirms something about an attribute and its values, as (givenName=John) or (maxRetries>=10). On the server, each assertion resolves to True, False, or Undefined (which is treated as False) for one or more entries in the DIT. Assertions can be grouped in boolean groups where all assertions (and group, specified with &) or at least one assertion (or group, specified with |) must be True. A single assertion can be negated (not group, specified with !). Each group must be bracketed, allowing for recursive filters.

Operators allowed in an assertion are = (equal), <= (less than or equal), >= (greater than or equal), =* (present), ~= (approximate), and := (extensible). Surprisingly the less than and the greater than operators don’t exist in the LDAP filter syntax. The aproximate and the extensible operators are obscure and seldom used. In an equality filter you can use the * character as a wildcard.

For example, to search for all users named John with an email ending with ‘@example.org’ the filter will be (&(givenName=John)(mail=*@example.org)), to search for all users named John or Fred with an email ending in ‘@example.org’ the filter will be (&(|(givenName=Fred)(givenName=John))(mail=*@example.org)), while to search for all users that have a givenName different from Smith the filter will be (!(givenName=Smith)).

Long search filters are difficult to understand. It may be useful to divide the text on multiple indented lines:

(&
    (|
        (givenName=Fred)
        (givenName=John)
    )
    (mail=*@example.org)
)

Let’s search all users in the FreeIPA demo LDAP server:

>>> from ldap3 import Server, Connection, ALL
>>> server = Server('ipa.demo1.freeipa.org', get_info=ALL)
>>> conn = Connection(server, 'uid=admin,cn=users,cn=accounts,dc=demo1,dc=freeipa,dc=org', 'Secret123', auto_bind=True)
>>> conn.search('dc=demo1,dc=freeipa,dc=org', '(objectclass=person)')
True
>>> conn.entries
[DN: uid=admin,cn=users,cn=accounts,dc=demo1,dc=freeipa,dc=org
, DN: uid=manager,cn=users,cn=accounts,dc=demo1,dc=freeipa,dc=org
, DN: uid=employee,cn=users,cn=accounts,dc=demo1,dc=freeipa,dc=org
, DN: uid=helpdesk,cn=users,cn=accounts,dc=demo1,dc=freeipa,dc=org
]

Here you request all the entries of class person, starting from the dc=demo1,dc=freeipa,dc=org context with the default subtree scope. You have not requested any attribute, so in the response we only get the Distinguished Name of the found entries.

Note

response vs result: in ldap3 every operation has a result that is stored in the result attribute of the Connection in sync strategies. Search operations store the found entries in the response attribute of the Connection object. For asynchronous strategies you must use the get_response(id) method that returns a tuple in the form of (response, result). If you use the get_request=True parameter you ask get_response() to also return the request dictionary, so the returned tuple will be (response, result, request).

Now let’s try to request some attributes from the admin user:

>>> conn.search('dc=demo1,dc=freeipa,dc=org', '(&(objectclass=person)(uid=admin))', attributes=['sn', 'krbLastPwdChange', 'objectclass'])
True
>>> conn.entries[0]
DN: uid=admin,cn=users,cn=accounts,dc=demo1,dc=freeipa,dc=org - STATUS: Read - READ TIME: 2016-10-09T20:39:32.711000
krbLastPwdChange: 2016-10-09 10:01:18+00:00
objectclass: top
             person
             posixaccount
             krbprincipalaux
             krbticketpolicyaux
             inetuser
             ipaobject
             ipasshuser
             ipaSshGroupOfPubKeys
             ipaNTUserAttrs
sn: Administrator

Warning

When using attributes in a search filter, it’s a good habit to always request for the structural class of the objects you expect to retrieve. Why? You cannot be sure that the attribute you’re searching for is not used is some other object class. Even if you are sure that no other object class uses it, this attribute could always change in the future when the schema is extended with an object class that uses that very same attribute, thus leading your program to suddenly break for no apparent reason.

Note that the entries attribute of the Connection object is derived from the ldap3 Abstraction Layer and it’s specially crafted to be used in interactive mode at the >>> prompt. It gives a visual representation of the entry data structure and each value is, according to the schema, properly formatted (the date value in krbLastPwdChange is actually stored as b'20161009010118Z', but it’s shown as a Python date object). Attributes can be queried either as a class or as a dict, with some additional features as case-insensitivity and blank-insensitivity. You can get the formatted value and the raw value (the value actually returned by the server) in the values and raw_values attributes:

>>> entry = conn.entries[0]
>>> entry.krbLastPwdChange
krbLastPwdChange: 2016-10-09 10:01:18+00:00
>>> entry.KRBLastPwdCHANGE
krbLastPwdChange: 2016-10-09 10:01:18+00:00
>>> entry['krbLastPwdChange']
krbLastPwdChange: 2016-10-09 10:01:18+00:00
>>> entry['KRB LAST PWD CHANGE']
krbLastPwdChange 2016-10-09 10:01:18+00:00

>>> entry.krbLastPwdChange.values
[datetime.datetime(2016, 10, 9, 10, 1, 18, tzinfo=OffsetTzInfo(offset=0, name='UTC'))]
>>> entry.krbLastPwdChange.raw_values
[b'20161009010118Z']

Note that the entry status is Read. This is not relevant if you only need to retrieve the entries from the DIT but it’s vital if you want to take advantage of the ldap3 Abstraction Layer making it Writable and change or delete its content via the Abstraction Layer. The Abstraction Layer also records the time of the last data read operation for the entry.

In the previous search operations you specified dc=demo1,dc=freeipa,dc=org as the base of the search, but the entries you were returned were in the cn=users,cn=accounts,dc=demo1,dc=freeipa,dc=org context of the DIT. So the server has, with no apparent reason, walked down every context under the base applying the filter to each of the entries in the sub-containers. The server actually performed a whole subtree search. Other possible kinds of searches are the single level search (that searches only in the level specified in the base) and the base object search (that searches only in the attributes of the entry specified in the base). What changes in this different kinds of search is the ‘breadth’ of the portion of the DIT that is searched. This breadth is called the scope of the search and can be specified with the search_scope parameter of the search operation. It can take three different values: BASE, LEVEL and SUBTREE. The latter value is the default for the search operation, so this clarifies why you got back all the entries in the sub-containers of the base in previous searches.

You can have a LDIF representation of the response of a search with:

>>> print(conn.entries[0].entry_to_ldif())
version: 1
dn: uid=admin,cn=users,cn=accounts,dc=demo1,dc=freeipa,dc=org
objectclass: top
objectclass: person
objectclass: posixaccount
objectclass: krbprincipalaux
objectclass: krbticketpolicyaux
objectclass: inetuser
objectclass: ipaobject
objectclass: ipasshuser
objectclass: ipaSshGroupOfPubKeys
krbLastPwdChange: 20161009010118Z
sn: Administrator
# total number of entries: 1

Note

LDIF stands for LDAP Data Interchange Format and is a textual standard used to describe two different aspects of LDAP: the content of an entry (LDIF-CONTENT) and the changes performed on an entry with an LDAP operation (LDIF-CHANGE). LDIF-CONTENT is used to describe LDAP entries in an stream (i.e. a file or a socket), while LDIF-CHANGE is used to describe the Add, Delete, Modify and ModifyDn operations.

These two formats have different purposes and cannot be mixed in the same stream.

or you can save the response to a JSON string:

>>> print(entry.entry_to_json())
{
    "attributes": {
        "krbLastPwdChange": [
            "2016-10-09 10:01:18+00:00"
        ],
        "objectclass": [
            "top",
            "person",
            "posixaccount",
            "krbprincipalaux",
            "krbticketpolicyaux",
            "inetuser",
            "ipaobject",
            "ipasshuser",
            "ipaSshGroupOfPubKeys"
        ],
        "sn": [
            "Administrator"
        ]
    },
    "dn": "uid=admin,cn=users,cn=accounts,dc=demo1,dc=freeipa,dc=org"

Searching for binary values

To search for a binary value you must use the RFC4515 ASCII escape sequence for each unicode point in the search assertion. ldap3 provides the helper function escape_bytes(byte_value) in ldap3.utils.conv to properly escape a byte sequence:

>>> from ldap3.utils.conv import escape_bytes
>>> unique_id = b'\xca@\xf2k\x1d\x86\xcaL\xb7\xa2\xca@\xf2k\x1d\x86'
>>> search_filter = '(nsUniqueID=' + escape_bytes(unique_id) + ')'
>>> conn.search('dc=demo1,dc=freeipa,dc=org', search_filter, attributes=['nsUniqueId'])

search_filter will contain (nsUniqueID=\\ca\\40\\f2\\6b\\1d\\86\\ca\\4c\\b7\\a2\\ca\\40\\f2\\6b\\1d\\86). The xx escaping format is specific to the LDAP protocol.

Entries Retrieval

Raw values for the attributes retrieved in an entry are stored in the raw_attributes dictonary in the response attribute. ldap3 provides some standard formatters used to format the values retrieved in a Search operation as specified by the RFCs according to the current schema syntaxes. If the schema is known (with get_info=SCHEMA or get_info=ALL in the Server object) and the check_names parameter of the Connection object is set to True, the attributes attribute is populated with the formatted values. If the attribute is defined in the schema as multi valued, then the attribute value is returned as a list (even if only a single value is present) else it’s returned as a single value.

Custom formatters can be added to specify how attribute values are returned. A formatter must be a callable that receives a bytes value and returns an object. It should never raise exceptions but it must return the original value if it’s not able to properly format the object.

What about empty attributes?

The LDAP protocol specifies that an attribute always contain a value. An attribute with no value is immediately removed by the LDAP server in the stored entry. This feature makes it harder to access the entry in your code because you must always check if an attribute key is present before accessing its value to avoid exceptions. ldap3 helps you to write simpler code because by default it returns an empty attribute even if it is not present in the LDAP you request it from in the attributes parameter of the search operation. To change this behaviour, you must set the return_empty_attributes parameter to False in the Connection object.

Simple Paged search

The Search operation can perform a simple paged search as specified in RFC 2696. The RFC states that you can ask the server to return a specific number of entries in each response set. With each search, the server sends back a cookie that you have to provide in each subsequent search. All this information must be passed in a Control attached to the request and the server responds with similar information in a Control attached to the response. ldap3 hides all this machinery in the paged_search() function of the extend.standard namespace:

>>> entries = conn.extend.standard.paged_search('dc=demo1,dc=freeipa,dc=org', '(objectClass=person)', attributes=['cn', 'givenName'], paged_size=5)
>>> for entry in entries:
>>>     print(entry)

Entries are returned in a generator, which can be useful when you have very long list of entries or have memory limitations. Also, it sends the requests to the LDAP server only when entries are consumed in the generator. Remember, a generator can be used only one time, so you must elaborate the results in a sequential way. If you don’t want the entries returned in a generator, you can pass the generator=False parameter to get all the entries in a list. In this case all the paged searches are performed by the paged_search() function and the set of entries found are queued in a list.

If you want to directly use the Search operation to perform a Paged search your code should be similar to the following:

>>> searchParameters = { 'search_base': 'dc=demo1,dc=freeipa,dc=org',
>>>                      'search_filter': '(objectClass=Person)',
>>>                      'attributes': ['cn', 'givenName'],
>>>                      'paged_size': 5 }
>>> while True:
>>>     conn.search(**searchParameters)
>>>     for entry in conn.entries:
>>>         print(entry)
>>>     cookie = conn.result['controls']['1.2.840.113556.1.4.319']['value']['cookie']
>>>     if cookie:
>>>         searchParameters['paged_cookie'] = cookie
>>>     else:
>>>         break

Even in this case, the ldap3 library hides the Simple Paged Control machinery, but you have to manage the cookie by yourself. The code would be much longer if you would directly manage the Simple Search Control. Also, you lose the generator feature.

Note

After performing a traditional LDAP Search operation with a SYNC strategy, you get back a collection of Entries in the entries property of the Connection object. This collection behaves as the Entries collection of a Reader cursor. For more comprehensive information about the Search operation, see the SEARCH documentation. An Entry in the entries collection can be modified making it Writable and applying modifications to it as described in the next chapter.