Tutorial: searching LDAP entries
Note
A more pythonic LDAP: LDAP operations look clumsy and hard-to-use because they reflect the age-old idea that time-consuming operations should be done on the client in order not to clutter and hog the server with unneeded elaboration. ldap3 includes a fully functional Abstraction Layer that lets you interact with the DIT in a modern and pythonic way. With the Abstraction Layer you don’t need to directly issue any LDAP operation at all.
Finding entries
To find entries in the DIT you must use the Search operation. This operation has a number of parameters, but only two of them are mandatory:
search_base
: the location in the DIT where the search will startsearch_filter
: a string that describes what you are searching for
Search filters are based on assertions and look odd when you’re unfamiliar with their syntax. One assertion is a
bracketed expression that affirms something about an attribute and its values, as (givenName=John)
or
(maxRetries>=10)
. On the server, each assertion resolves to True, False, or Undefined (which is treated as False) for
one or more entries in the DIT. Assertions can be grouped in boolean groups where all assertions (and group, specified
with &
) or at least one assertion (or group, specified with |
) must be True. A single assertion can be negated
(not group, specified with !
). Each group must be bracketed, allowing for recursive filters.
Operators allowed in an assertion are =
(equal), <=
(less than or equal), >=
(greater than or equal),
=*
(present), ~=
(approximate), and :=
(extensible). Surprisingly the less than and the
greater than operators don’t exist in the LDAP filter syntax. The aproximate and the extensible operators are obscure
and seldom used. In an equality filter you can use the *
character as a wildcard.
For example, to search for all users named John with an email ending with ‘@example.org’ the filter will be
(&(givenName=John)(mail=*@example.org))
, to search for all users named John or Fred with an email ending in ‘@example.org’
the filter will be (&(|(givenName=Fred)(givenName=John))(mail=*@example.org))
, while to search for all users that
have a givenName different from Smith the filter will be (!(givenName=Smith))
.
Long search filters are difficult to understand. It may be useful to divide the text on multiple indented lines:
(&
(|
(givenName=Fred)
(givenName=John)
)
(mail=*@example.org)
)
Let’s search all users in the FreeIPA demo LDAP server:
>>> from ldap3 import Server, Connection, ALL
>>> server = Server('ipa.demo1.freeipa.org', get_info=ALL)
>>> conn = Connection(server, 'uid=admin,cn=users,cn=accounts,dc=demo1,dc=freeipa,dc=org', 'Secret123', auto_bind=True)
>>> conn.search('dc=demo1,dc=freeipa,dc=org', '(objectclass=person)')
True
>>> conn.entries
[DN: uid=admin,cn=users,cn=accounts,dc=demo1,dc=freeipa,dc=org
, DN: uid=manager,cn=users,cn=accounts,dc=demo1,dc=freeipa,dc=org
, DN: uid=employee,cn=users,cn=accounts,dc=demo1,dc=freeipa,dc=org
, DN: uid=helpdesk,cn=users,cn=accounts,dc=demo1,dc=freeipa,dc=org
]
Here you request all the entries of class person, starting from the dc=demo1,dc=freeipa,dc=org context with the default subtree scope. You have not requested any attribute, so in the response we only get the Distinguished Name of the found entries.
Note
response vs result: in ldap3 every operation has a result that is stored in the result
attribute of the
Connection in sync strategies. Search operations store the found entries in the response
attribute of the
Connection object. For asynchronous strategies you must use the get_response(id)
method that returns a tuple
in the form of (response, result). If you use the get_request=True
parameter you ask get_response()
to also return
the request dictionary, so the returned tuple will be (response, result, request).
Now let’s try to request some attributes from the admin user:
>>> conn.search('dc=demo1,dc=freeipa,dc=org', '(&(objectclass=person)(uid=admin))', attributes=['sn', 'krbLastPwdChange', 'objectclass'])
True
>>> conn.entries[0]
DN: uid=admin,cn=users,cn=accounts,dc=demo1,dc=freeipa,dc=org - STATUS: Read - READ TIME: 2016-10-09T20:39:32.711000
krbLastPwdChange: 2016-10-09 10:01:18+00:00
objectclass: top
person
posixaccount
krbprincipalaux
krbticketpolicyaux
inetuser
ipaobject
ipasshuser
ipaSshGroupOfPubKeys
ipaNTUserAttrs
sn: Administrator
Warning
When using attributes in a search filter, it’s a good habit to always request for the structural class of the objects you expect to retrieve. Why? You cannot be sure that the attribute you’re searching for is not used is some other object class. Even if you are sure that no other object class uses it, this attribute could always change in the future when the schema is extended with an object class that uses that very same attribute, thus leading your program to suddenly break for no apparent reason.
Note that the entries
attribute of the Connection object is derived from the ldap3 Abstraction Layer and it’s
specially crafted to be used in interactive mode at the >>>
prompt. It gives a visual representation of the entry data
structure and each value is, according to the schema, properly formatted (the date value in krbLastPwdChange is actually
stored as b'20161009010118Z'
, but it’s shown as a Python date object). Attributes can be queried
either as a class or as a dict, with some additional features as case-insensitivity and blank-insensitivity. You can get
the formatted value and the raw value (the value actually returned by the server) in the values
and raw_values
attributes:
>>> entry = conn.entries[0]
>>> entry.krbLastPwdChange
krbLastPwdChange: 2016-10-09 10:01:18+00:00
>>> entry.KRBLastPwdCHANGE
krbLastPwdChange: 2016-10-09 10:01:18+00:00
>>> entry['krbLastPwdChange']
krbLastPwdChange: 2016-10-09 10:01:18+00:00
>>> entry['KRB LAST PWD CHANGE']
krbLastPwdChange 2016-10-09 10:01:18+00:00
>>> entry.krbLastPwdChange.values
[datetime.datetime(2016, 10, 9, 10, 1, 18, tzinfo=OffsetTzInfo(offset=0, name='UTC'))]
>>> entry.krbLastPwdChange.raw_values
[b'20161009010118Z']
Note that the entry status is Read. This is not relevant if you only need to retrieve the entries from the DIT but it’s vital if you want to take advantage of the ldap3 Abstraction Layer making it Writable and change or delete its content via the Abstraction Layer. The Abstraction Layer also records the time of the last data read operation for the entry.
In the previous search operations you specified dc=demo1,dc=freeipa,dc=org
as the base of the search, but the entries
you were returned were in the cn=users,cn=accounts,dc=demo1,dc=freeipa,dc=org
context of the DIT. So the server has, with
no apparent reason, walked down every context under the base applying the filter to each of the entries in the sub-containers.
The server actually performed a whole subtree search. Other possible kinds of searches are the single level search
(that searches only in the level specified in the base) and the base object search (that searches only in the attributes
of the entry specified in the base). What changes in this different kinds of search is the ‘breadth’ of the portion of the
DIT that is searched. This breadth is called the scope of the search and can be specified with the search_scope
parameter of the search operation. It can take three different values: BASE
, LEVEL
and SUBTREE
. The latter
value is the default for the search operation, so this clarifies why you got back all the entries in the sub-containers
of the base in previous searches.
You can have a LDIF representation of the response of a search with:
>>> print(conn.entries[0].entry_to_ldif())
version: 1
dn: uid=admin,cn=users,cn=accounts,dc=demo1,dc=freeipa,dc=org
objectclass: top
objectclass: person
objectclass: posixaccount
objectclass: krbprincipalaux
objectclass: krbticketpolicyaux
objectclass: inetuser
objectclass: ipaobject
objectclass: ipasshuser
objectclass: ipaSshGroupOfPubKeys
krbLastPwdChange: 20161009010118Z
sn: Administrator
# total number of entries: 1
Note
LDIF stands for LDAP Data Interchange Format and is a textual standard used to describe two different aspects of LDAP: the content of an entry (LDIF-CONTENT) and the changes performed on an entry with an LDAP operation (LDIF-CHANGE). LDIF-CONTENT is used to describe LDAP entries in an stream (i.e. a file or a socket), while LDIF-CHANGE is used to describe the Add, Delete, Modify and ModifyDn operations.
These two formats have different purposes and cannot be mixed in the same stream.
or you can save the response to a JSON string:
>>> print(entry.entry_to_json())
{
"attributes": {
"krbLastPwdChange": [
"2016-10-09 10:01:18+00:00"
],
"objectclass": [
"top",
"person",
"posixaccount",
"krbprincipalaux",
"krbticketpolicyaux",
"inetuser",
"ipaobject",
"ipasshuser",
"ipaSshGroupOfPubKeys"
],
"sn": [
"Administrator"
]
},
"dn": "uid=admin,cn=users,cn=accounts,dc=demo1,dc=freeipa,dc=org"
Searching for binary values
To search for a binary value you must use the RFC4515 ASCII escape sequence for each unicode point in the search assertion. ldap3 provides the helper function escape_bytes(byte_value) in ldap3.utils.conv to properly escape a byte sequence:
>>> from ldap3.utils.conv import escape_bytes
>>> unique_id = b'\xca@\xf2k\x1d\x86\xcaL\xb7\xa2\xca@\xf2k\x1d\x86'
>>> search_filter = '(nsUniqueID=' + escape_bytes(unique_id) + ')'
>>> conn.search('dc=demo1,dc=freeipa,dc=org', search_filter, attributes=['nsUniqueId'])
search_filter
will contain (nsUniqueID=\\ca\\40\\f2\\6b\\1d\\86\\ca\\4c\\b7\\a2\\ca\\40\\f2\\6b\\1d\\86)
. The xx
escaping format is specific to the LDAP protocol.
Entries Retrieval
Raw values for the attributes retrieved in an entry are stored in the raw_attributes
dictonary in the response
attribute.
ldap3 provides some standard formatters used to format the values retrieved in a Search operation as
specified by the RFCs according to the current schema syntaxes. If the schema
is known (with get_info=SCHEMA
or get_info=ALL
in the Server object) and the check_names
parameter of the Connection object is set to True, the attributes
attribute is populated with the formatted values.
If the attribute is defined in the schema as multi valued, then the attribute value is returned as a list (even if only
a single value is present) else it’s returned as a single value.
Custom formatters can be added to specify how attribute values are returned. A formatter must be a callable that receives a bytes value and returns an object. It should never raise exceptions but it must return the original value if it’s not able to properly format the object.
What about empty attributes?
The LDAP protocol specifies that an attribute always contain a value. An attribute with no value is immediately removed
by the LDAP server in the stored entry. This feature makes it harder to access the entry in your code because you must always check
if an attribute key is present before accessing its value to avoid exceptions. ldap3 helps you to write simpler code
because by default it returns an empty attribute even if it is not present in the LDAP you request it from in the attributes
parameter of the search operation. To change this behaviour, you must set the return_empty_attributes
parameter to
False in the Connection object.
Simple Paged search
The Search operation can perform a simple paged search as specified in RFC 2696. The RFC states that you can ask the server
to return a specific number of entries in each response set. With each search, the server sends back a cookie that you have to
provide in each subsequent search. All this information must be passed in a Control attached to the request and the server responds
with similar information in a Control attached to the response.
ldap3 hides all this machinery in the paged_search()
function of the extend.standard namespace:
>>> entries = conn.extend.standard.paged_search('dc=demo1,dc=freeipa,dc=org', '(objectClass=person)', attributes=['cn', 'givenName'], paged_size=5)
>>> for entry in entries:
>>> print(entry)
Entries are returned in a generator, which can be useful when you have very long list of entries or have memory limitations. Also, it sends the requests to the LDAP server only when entries are consumed in the generator. Remember, a generator can be used only one time, so you must elaborate the results in a sequential way. If you don’t want the entries returned in a generator, you can pass the
generator=False
parameter to get all the entries in a list. In this case all the paged searches are performed by the paged_search()
function and the set of entries found are queued in a list.
If you want to directly use the Search operation to perform a Paged search your code should be similar to the following:
>>> searchParameters = { 'search_base': 'dc=demo1,dc=freeipa,dc=org',
>>> 'search_filter': '(objectClass=Person)',
>>> 'attributes': ['cn', 'givenName'],
>>> 'paged_size': 5 }
>>> while True:
>>> conn.search(**searchParameters)
>>> for entry in conn.entries:
>>> print(entry)
>>> cookie = conn.result['controls']['1.2.840.113556.1.4.319']['value']['cookie']
>>> if cookie:
>>> searchParameters['paged_cookie'] = cookie
>>> else:
>>> break
Even in this case, the ldap3 library hides the Simple Paged Control machinery, but you have to manage the cookie by yourself. The code would be much longer if you would directly manage the Simple Search Control. Also, you lose the generator feature.
Note
After performing a traditional LDAP Search operation with a SYNC strategy, you get back a collection of Entries in the
entries
property of the Connection object. This collection behaves as the Entries collection of a Reader cursor.
For more comprehensive information about the Search operation, see the SEARCH documentation.
An Entry in the entries
collection can be modified making it Writable and applying modifications to it as
described in the next chapter.