Federated analyses of multiple data sources in drug safety studies

Authors Rickard Ljung, Peter Wilén, Wilmar Igl, Rolf Gedeborg, Bénédicte Delcoigne, Karl Michaëlsson, Bodil Svennblad, Nils Feltelius,
License CC-BY-4.0
Plaintext
Received: 1 August 2022       Revised: 30 November 2022         Accepted: 12 December 2022
DOI: 10.1002/pds.5587


REVIEW




Federated analyses of multiple data sources in drug safety
studies

Rolf Gedeborg 1                   |     Wilmar Igl 2            |   Bodil Svennblad 3                |    Peter Wilén 4 |
Bénédicte Delcoigne 5 |                          Karl Michaëlsson 3               |   Rickard Ljung 6 |                  Nils Feltelius 6

1
 Department of Efficacy and Safety 1, Division
of Licensing, Medical Products Agency,               Abstract
Uppsala, Sweden
                                                     Purpose: Studies of rare side effects of new drugs with limited exposure may require
2
 Statistics Group, Department of Efficacy and
Safety 2, Division of Licensing, Medical
                                                     pooling of multiple data sources. Federated Analyses (FA) allow real-time, interactive,
Products Agency, Uppsala, Sweden                     centralized statistical processing of individual-level data from different data sets
3
 Department of Surgical Sciences, Unit of            without transfer of sensitive personal data.
Medical Epidemiology, Uppsala University,
Uppsala, Sweden                                      Methods: We review IT-architecture, legal considerations, and statistical methods in
4
 Department of Legal Affairs, Medical                FA, based on a Swedish Medical Products Agency methodological development
Products Agency, Uppsala, Sweden
5
                                                     project.
Clinical Epidemiology Division, Department of
Medicine Solna, Karolinska Institutet, Sweden        Results: In a review of all post-authorisation safety studies assessed by the EMA dur-
6
 Division of Use and Information, Medical            ing 2019, 74% (20/27 studies) reported issues with lack of precision in spite of mean
Products Agency, Uppsala, Sweden
                                                     study periods of 9.3 years. FA could potentially improve precision in such studies.
Correspondence                                       Depending on the statistical model, the federated approach can generate identical
Rolf Gedeborg, Department of Efficacy and
                                                     results to a standard analysis. FA may be particularly attractive for repeated collabo-
Safety 1, Division of Licensing, Medical
Products Agency, Uppsala, Sweden.                    rative projects where data is regularly updated. There are also important limitations.
Email: rolf.gedeborg@lakemedelsverket.se
                                                     Detailed agreements between involved parties are strongly recommended to antici-
Funding information                                  pate potential issues and conflicts, document a shared understanding of the project,
Lakemedelsverket (Swedish Medical Products
                                                     and fully comply with legal obligations regarding ethics and data protection. FA do
Agency)
                                                     not remove the data harmonisation step, which remains essential and often cumber-
                                                     some. Reliable support for technical integration with the local server architecture and
                                                     security solutions is required. Common statistical methods are available, but adapta-
                                                     tions may be required.
                                                     Conclusions: Federated Analyses require competent and active involvement of all
                                                     collaborating parties but have the potential to facilitate collaboration across institu-
                                                     tional and national borders and improve the precision of postmarketing drug safety
                                                     studies.

                                                     KEYWORDS
                                                     adverse drug reactions, decentralised analysis, federated analysis, pooled analysis, post-
                                                     authorisation safety studies, product surveillance, postmarketing, registers




This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium,
provided the original work is properly cited.
© 2022 The Authors. Pharmacoepidemiology and Drug Safety published by John Wiley & Sons Ltd.

Pharmacoepidemiol Drug Saf. 2023;32:279–286.                                                                           wileyonlinelibrary.com/journal/pds     279
                                                                                                                                                              10991557, 2023, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/pds.5587 by CochraneItalia, Wiley Online Library on [13/10/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
280                                                                                                                                       GEDEBORG ET AL.



                                                    Key Points
                                                    • Federated analysis allows real-time, interactive, centralized statistical analysis on individual-
                                                       level data, without actual transfer of sensitive personal data between institutions and
                                                       countries.
                                                    • The technique may be particularly attractive for situations where repeated collaborative pro-
                                                       jects are anticipated, and the cohorts are dynamic.
                                                    • Common statistical methods are available as mathematical models and software implementa-
                                                       tions for federated analysis.
                                                    • It is strongly recommended that the division of responsibility, including economic undertak-
                                                       ings, and limitations on liability are always documented in a contract or other legally binding
                                                       document.
                                                    • Successful implementation of federated analysis requires competent and active involvement
                                                       of all collaborating parties.




1     |   I N T RO DU CT I O N                                                    used statistical models, for example, Generalized Linear Regression
                                                                                  models.7–9
Characterising rare adverse drug reactions is a common and challeng-                   In FA all individual-level information remains protected behind
ing regulatory concern, especially for new drugs and drugs with lim-              the normal security mechanisms of the local host system, and the data
ited exposure. The need to combine multiple data sources for such                 owner retains full control over the minimum level of aggregation
analyses is becoming increasingly important, but combining different              required to allow data to be viewed by the analyst, with protection
sources of individual-level data is a complex process, especially when            against unauthorized use (Figure 1). From a statistical perspective, FA
different countries are involved.                                                 allows bi-directional exchange of information between a statistical
      The rapid development of the COVID-19 vaccines and need for a               model and sensitive individual-level patient data via anonymized
fast implementation of national vaccination programs has highlighted              group-level summary results, without sharing information on individ-
the need to generate timely post-approval safety data to detect                   uals. Thus, the data owners retain full local control over security set-
potential uncommon adverse reactions. Having agreements and tech-                 tings that determine the level of aggregation required to protect
nical arrangements in place for such pooling of data from multiple                sensitive personal information.
sources and countries would greatly facilitate the generation of timely                Whenever multiple raw datasets are used for a study, there is a
study results.                                                                    need to create common study variables, a “Common Data Model”,
      Federated analysis (FA) is a technique that may facilitate a centra-        with common structure and format for the variables.10–12 This is a
lised combined analysis of multiple decentralised data sources without            prerequisite for FA as well as for any other strategy for combined
requiring actual data merging. We review the findings in a FA devel-              analysis of individual level data from different data sources. It is
opment project conducted by the Swedish Medical Products Agency                   essential that the variables have similar definitions by harmonisation
covering computer engineering aspects, limitations and opportunities              using transparent and well documented algorithms.
of statistical methods, the legality and regularity of FA involving the                A two-step meta-analysis is another alternative to direct access
processing of personal data, and tools for validating implementation.             to combined individual level data. With harmonised data at each data
The focus is on FA that can generate identical results as individual              node, and a distributed code for analysis, the results from each data
level pooling of data in comparative epidemiological studies requiring            node can then be aggregated using conventional meta-analysis. In
regression analysis to control for confounding.                                   many instances this will generate results similar to a one-step analysis
                                                                                  on individual level data.13–15 One potential limitation is that parame-
                                                                                  ters for covariates may be inconsistently estimated across different
2     |   WHAT ARE FEDERATED ANALYSES?                                            data sets, but this is not necessarily a concern.
                                                                                       In principle, FA and two-step meta-analysis are statistically equiv-
Federated analysis allow real-time, interactive, central statistical analyses     alent. The main advantage is that FA allows real-time, interactive, cen-
in a system of federated databases, without transferring individual level         tralized, standardized analysis of distributed data, that is, without
data outside the protective security mechanisms of the original host sys-         having to ask data owners to perform specific analysis and provide
tem (Figure 1).1 It is therefore an apparently attractive alternative to          the results. The main disadvantage is that the required IT infrastruc-
physical merging of data and there are several examples of initiatives            ture is considerably more complex for FA than for two-step meta-ana-
                                                                    2–6
implementing different forms of FA for pharmacoepidemiology.              That    lyses. While meta-analysis can be performed on standard statistical
the federated approach provides identical results compared to analyses            software and personal computers, FA requires a complex software
of physically merged data has been demonstrated for commonly                      stack on a federated client–server architecture.
                                                                                                                                                          10991557, 2023, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/pds.5587 by CochraneItalia, Wiley Online Library on [13/10/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
GEDEBORG ET AL.                                                                                                                                    281




                           Data source                                                                     Data source
                                1                                                                               2

                    Each organisaons normal
                    data protecon mechanisms
                    (”Firewalls”)
                                                                     Analysis
                                                                    computer


                           Data source                                                                     Data source
                                3                                                                               4


F I G U R E 1 A schematic representation of the concept of a Federated analysis, here illustrated by four separate data sources. The red boxes
denote each data owner's normal security mechanisms for protection of sensitive personal information. The analysis task is submitted to each
data server and executed locally. All summary results are then sent back to the analysis client, where they are integrated to a single result. This
process may be iterated, and results optimized until the analysis has converged to a final result.




3 | H O W C A N F E D E R A T E D A N A LY S E S                             4 | W H I C H ST A T I S T I C A L A N A L Y S E S A R E
F A CI LI T A TE P O S T- M A R K E T I N G S T U D I E S                    A V A I L A B L E A N D W H A T A R E TH E
FROM A REGULATORY PERSPECTIVE?                                               LIM I TAT I ON S?

Characterisation of drug safety profiles is one of the core missions of      In our review of PASS the most common statistical methods used
regulatory agencies. Data available at the time of a marketing authori-      were Cox proportional hazards regression (59%; 16/27) and logistic
zation is often not sufficient to fully characterise safety and non-         regression (15%; 4/27). Negative binomial regression, Poisson
interventional post-authorisation safety studies (PASS) using existing       regression, generalised estimation equations (GEE), LASSO regres-
health-care databases are often required. A current example is the           sion, and regularised regression were applied in only a few studies.
characterisation of myocarditis as an adverse reaction from mRNA             Models incorporating random effects were used in 22% (6/27) of
                     16
COVID-19 vaccines.                                                           these studies. Propensity scores (PS) were used in 44% (12/27) of
    To assess the practical extent of insufficient sample size in PASS,      the studies. The extent and potential consequences of missing data
and the potential of FA to address this concern, we reviewed regula-         was reviewed in only 52% (14/27) of the assessment reports. In
tory assessment reports of PASS final results on the agenda for the          48% (13/27) of these studies missing data was stated as a potential
plenary meetings of the European Medicines Agency Pharmacovigi-              concern. The methods used to handle missing data were single
lance Risk Assessment Committee (PRAC) in 2019. Results from infer-          imputation (4/13), multiple imputation (3/13), missing category
ential studies involving modelling of covariates were selected for           (2/13), complete case analysis (1/13), no method (1/13), or not
review. The review was restricted to 27 non-interventional PASS with         stated (2/13).
an inferential design and study objective motivating regression analy-             The following sections discuss opportunities and limitations
sis (Figure 2). The time periods observed in these studies were on           regarding statistical methods and related issues in relation to the find-
average 9.3 years, ranging from 5 to 19 years. Despite this, 74%             ings in our review of PASS study results (Supplemental online-only
(20/27) of the assessments reported issues with poor precision of            material: Report Federated analyses – Statistical methods).
estimates.
    A total of 20 studies were performed using a single data source,
but in 70% (14/20) of these we still assessed a FA approach applica-         4.1     |   Horizontally or vertically partitioned data
ble, based on availability of other similarly structured data sources
that could have been used. In one of seven studies based on multiple         A standard FA requires that the data is horizontally partitioned, mean-
data sources a FA approach had been used, and four studies were              ing that all data sources include different sets of patients, but with the
considered potentially suitable since multiple similarly structured data     same variables. This is also the key type of data pooling needed to
sources were used in the study but without pooling of individual level       increase the sample size and precision of estimates. In vertically parti-
data. This suggests that FA could be applicable in the majority of stud-     tioned data, on the other hand, all data nodes include the same set of
ies involving multiple datasets.                                             patients, but with different sets of variables. This type of data pooling
                                                                                                                                                         10991557, 2023, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/pds.5587 by CochraneItalia, Wiley Online Library on [13/10/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
282                                                                                                                                  GEDEBORG ET AL.



                                                                                                                 F I G U R E 2 Flow chart
                                                                                                                 describing selection of a post-
                                                                                                                 authorisation safety studies
                                                                                                                 (PASS) where final study results
                                                                                                                 were assessed by the PRAC in
                                                                                                                 2019. Studies with an aim
                                                                                                                 involving inferential analyses with
                                                                                                                 modelling of multiple variables
                                                                                                                 were assessed for their potential
                                                                                                                 suitability for Federated
                                                                                                                 analysis (FA).




is also of interest,17–19 but FA on vertically partitioned data requires   4.4    |    Propensity scores
the involvement of a trusted third-party facility.20
                                                                           Propensity scores based on measured baseline covariates are commonly
                                                                           used to control for confounding.27,28 They can be calculated using
4.2     |   Generalized linear models                                      logistic-binomial regression, which is available as FA. In a FA the propen-
                                                                           sity scores can either be site-specific, where a model is fitted separately
Generalized linear models (GLM)s include many common regression            in each site with the possibility of including site-specific measurement
models, for example, linear regression, logistic regression, or Poisson    covariates to control within-site confounding29 or fitted using harmo-
regression, which have a broad range of applications in epidemiology.      nized variables and all observations as if pooled to control between-site
They can be used in FA in a form that gives identical results to the       confounding. The propensity score can then be used for stratification,
standard formulation of GLMs.7–9 GLMs are implemented, for exam-           reweighting or adjustment, but matching across sites will not be possible.
ple, in the R/DataSHIELD package for FA.21


                                                                           4.5    |    Missing data
4.3     |   Time-to-event analyses
                                                                           When a complete case analysis is inappropriate, trivial methods for impu-
The equivalence of the Cox proportional hazards model in a FA com-         tation of missing values, such as by imputation of the mean, can be
pared to a standard analysis has been demonstrated.22 This approach        applied.30 If data is missing-at-random (MAR), maximum likelihood esti-
requires that distinct event times are shared between sites, which         mation (without imputation) or multiple imputation are recommended.31
potentially could identify a patient. There are also limitations in han-   Multiple imputation by Chained Equations (MICE)32 can be used in FA
dling high-dimensional data.                                               and is under development to be integrated in the DataSHIELD soft-
      A federated meta-analysis approach using the Cox model has           ware.33 If data is missing-not-at-random (MNAR), model-based imputa-
recently been added to the R/DataSHIELD package performing the             tion is required, but has not been applied in FA to our knowledge. When
Cox model separately in each federated data set.23 The results can         a variable is missing completely at one or more data sites a possible solu-
then be combined using meta-analysis. In case of rare events the           tion is to replace the covariate based on a prediction model (including
result from such an analysis can be biased because of the normal           uncertainty) using other correlated variables from other data sources.
approximation of the likelihood function.24 Other options for the like-    This is equally applicable for physically merged data and FA.
lihood function approximation exist but are not yet implemented as
FA but for use in a traditional meta-analysis approach (e.g., the pack-
age R/EvidenceSynthesis).24                                                4.6    |    Limitations
      The only option to analyse time to event data federated on
individual data, without sharing event time between sites and              Statistical analyses that require the combined or joint distribution of
without using the meta-analysis approach, is to approximate the            individual-level data from multiple data sources, or are based on non-
Cox proportional hazard model with a GLM using Poisson                     parametric empirical distributions, which cannot be approximated by
regression.25,26                                                           parametric distributions, will not be possible or are technically
                                                                                                                                                         10991557, 2023, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/pds.5587 by CochraneItalia, Wiley Online Library on [13/10/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
GEDEBORG ET AL.                                                                                                                                   283



challenging. Examples of such analyses are the identification of dupli-         Federated analysis requires a basic IT setup with a central access
cated patient records across datasets, the calculation of the median        server with R Studio software, which connects to several DataSHIELD
across multiple datasets, or the analysis of variables with non-            servers via secure permanent VPN channels. The client (i.e., analyst)
parametric statistical distributions whose properties cannot be ade-        computer has a secure connection to this central access server. Once
quately summarized by a parametric distribution. These issues can be        this central node server is setup at one of the locations, the access to
solved by                                                                   all other servers is through this node server. The implementation and
                                                                            maintenance of such a system requires a strong commitment from the
• the use of alternative statistical measures, for example, by calculat-    IT support teams at all locations. Integration with existing local IT
   ing the (weighted) mean instead of the median, or the standard           architecture and IT policies may be challenging (Supplemental online-
   deviation instead of range,                                              only material: Report Federated analyses - IT architecture).
• the use of alternative distributions, for example, deriving the
   median from a fitted parametric distribution,
• or the transformation of a variable, for example, from a complex,         6 | W H A T L EG A L I M P L I C A T I O N S S HO U L D
   continuous distribution to a categorical distribution.                   BE CONSIDERED?
• the comparison of encrypted patient-identifying features20
                                                                            The main advantage of FA is that the processing happens at the node
                                                                            level, which gives more control over sensitive personal data to the
4.7     |     Computation time                                              local research principals at each node. While the General Data Protec-
                                                                            tion Regulation (GDPR) is a common legislation for all EU countries,
In an FA all participating data servers must be up and running simulta-     national legislation for ethical approval of research may still result in
neously for the duration of the analysis. Computation time for a FA is      differences between EU countries in views on access to sensitive per-
determined by the network latency or lowest specification hardware          sonal information for research. When FA is used, the local research
in the overall system, and will also depend on the number of steps in       principal at each node performs several specific processing operations
iterative statistical analyses.18 The statistical models may have to be     on personal data in the stage prior to the actual FA. The statistical
re-formulated in a way that requires the computation of additional          analysis performed locally at the node constitutes a personal data pro-
parameters. For example, a log-Poisson Generalized Linear Model             cessing operation. The numerical estimates delivered to the central
requires additional parameters compared to a Cox proportional haz-          server by each node are not to be construed as personal data when
ards model, and therefore increased computation time.34 Additional          data only consists of aggregated results from statistical calculations.
processing may also be required to apply data disclosure controls.18            Responsibility for personal data is based on the real influence an
                                                                            actor has over each processing operation. The term controller means
                                                                            the natural or legal person, public authority, agency or other body
5 | WHAT TO CONSIDER REGARDING IT                                           which, alone or jointly with others, determines the purpose and means
ARCHITECTURE?                                                               of the processing of personal data. The term processor means a natu-
                                                                            ral or legal person, public authority, agency or other body which pro-
Federated analysis requires a complex software stack of applications        cesses personal data on behalf of the controller. There may be
to coordinate data management and data analysis. One example is the         multiple controllers for the various processing operations performed.
open-source OBiBa software application suite developed in the Mael-         In FA an actor may have a real influence over a given processing oper-
strom Research project.8,18,35,36 This aims to facilitate epidemiological   ation even if no individual level data has been transferred to that
research using multiple, physically separate data sources. This involves    actor. It is therefore appropriate to establish early in the planning
collecting and harmonising data from different databases, publishing        phase if there will be one or more research projects and distribute
general information about the content of data sources, and creating         roles and responsibilities accordingly. All agreements should be in
tools for FA. There is a strong collaboration with the Data to Knowl-       writing and regulate roles and responsibilities, as well as liability for
edge        Research   Group     (D2K)   at    Newcastle     University,    any failure to comply with the terms of the agreement.
United Kingdom, which is spearheading the development of Data-                  In traditional register-based research, the role of the controller for
Shield, the key software for FA within the R software for statistical       different processing activities tends to be assigned based on where
computing. All software developed in Maelstrom Research has a GPL           the personal data is being processed. This means the role of controller
                         37
v3 open-source licence.                                                     may be transferred between organisations and research principals
      The Maelstrom Research project software was used in the Swed-         together with the personal data. When FA is used, responsibility for
ish Medical Products Agency development project as an example               processing may be divided among the participating research principals,
because it is rooted in a non-profit organisation, uses open-source         even if the personal data in question is not transferred. It is crucial to
code, and it is used by academic research groups in Sweden. There           clarify procedures, roles, and relationships to determine who is a con-
are several other technical solutions for FA, but it was not within the     troller. All collaborating parties should be aware of the division of
scope of this project to compare different software.                        responsibilities to avoid any party being able to influence the purpose
                                                                                                                                                            10991557, 2023, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/pds.5587 by CochraneItalia, Wiley Online Library on [13/10/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
284                                                                                                                                    GEDEBORG ET AL.



MODEL 1: Data processing agreements                  MODEL 2: Sole overall responsibility              MODEL 3: Joint controllers
The research principal for Node 1 is the             A single research principal with overall          Joint overall responsibility in combination
controller of all personal data processed in         responsibility, in combination with nodes         with nodes with independent responsibility.
the federated analysis. The research                 with independent responsibility.
principals for Nodes 2 and 3 are
processors.




F I G U R E 3 Three theoretical models for the division of controller legal responsibility in a research project that uses Federated analysis. The
models illustrate the influence exerted by one or more research principals over operations performed in the central server and in the nodes. The
models assume that the operations described can be deemed to fall within the scope of the General Data Protection Regulation (GDPR); for
example, in terms of responsibility for implementing security measures or technical measures.




or means of processing, thus altering the de facto control of proces-        limitations on liability are always documented in a contract or other
sing. The roles of controller and processor are distributed based on         legally binding document. The division of responsibility and how pro-
how responsibility for the purpose and means of processing is to be          cessing is to be performed within the project must be unambiguous
allocated. This in turn depends on the intended nature of collabora-         and transparent.
tion and the influence that each of the participating research princi-
pals has on the analysis. Although the primary responsibility is to the
data subject, there is also a responsibility to the other parties involved   7    |   CONC LU SIONS
in the project.
      We propose three different theoretical models for the distribu-        Federated analysis allows real-time, interactive, centralized statistical
tion of responsibility as controller of personal data in FA (Figure 3).      analyses on individual-level data, without actual transfer of sensitive
They are intended to facilitate an analysis where actors can deter-          personal data between institutions and countries. It has the potential
mine the purpose and means of personal data processing. It is                to facilitate collaboration and improve the precision of postmarketing
important that policymakers, researchers, technicians, lawyers, and          safety studies, by increasing the quantity, variety, and availability of
others with key roles in a research project are sufficiently familiar        data needed to study rare adverse events. Our review of post-
with how FA works, how data protection regulations should be                 authorisation studies indicates that lack of precision in such studies is
applied, and the intention behind distributing responsibility in the         a common limitation. The technique may be particularly attractive for
project. If a processing operation serves common scientific inter-           situations where repeated collaborative projects are anticipated, and
ests, the operators jointly determine the purposes of processing,            the cohorts are dynamic. A recent example is the need to characterise
even if they each have their own specific purposes at an earlier or          timely postmarketing safety of COVID-19 vaccines. The Nordic coun-
later stage.                                                                 tries, having very similar nationwide health data resources, would be
      The question of who is to be considered the controller of the          an attractive area for such collaborations. Rheumatic diseases can
personal data being processed is important, given that the control-          serve as another example of a therapeutic area that has seen a rapid
ler shall ensure compliance with GDPR in all processing operations           development of new medicinal products and a need to further charac-
for which they are responsible. This implies that the controller shall       terise their safety profile in clinical practice, and where there are exist-
ensure that data subjects are informed about how their personal              ing disease registers with similar structures in several different
data is being processed. Data subjects have the right to obtain the          countries.38 Such situations would likely benefit from using FA.
rectification or erasure of personal data concerning them. The con-               Federated analysis does not remove the data harmonisation
troller or processor is also liable to compensate any person who has         step, requires reliable support for integrating the FA-specific IT
suffered damage.                                                             architecture with the respective organisation's general IT architec-
      There is no requirement pursuant to GDPR for a written agree-          ture and security solutions, and should be based on clear and
ment between joint controllers, but it is strongly recommended that          detailed agreements between involved parties to fully comply with
the division of responsibility, including economic undertakings, and         legal obligations. Common statistical methods are available as
                                                                                                                                                                 10991557, 2023, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/pds.5587 by CochraneItalia, Wiley Online Library on [13/10/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
GEDEBORG ET AL.                                                                                                                                           285



mathematical models and software implementations for FA. The                    10. The Book of OHDSI. Chapter 4: The Common Data Model. 2021.
implementation of FA requires competent and active involvement                      Accessed       2022-10-10,      2022,      at     https://ohdsi.github.io/
                                                                                    TheBookOfOhdsi/
of all collaborating parties.
                                                                                11. Cohen JM, Cesta CE, Kjerpeseth L, et al. A common data model for
    The four full reports from the Swedish Medical Products Agency                  harmonization in the Nordic pregnancy drug safety studies
on IT architecture, legal considerations, statistical methods, and a                (NorPreSS). Norsk Epidemiologi. 2021;29:117-123.
tutorial are provided in the online Appendix.                                   12. Gini R, Sturkenboom MCJ, Sultana J, et al. Different strategies to exe-
                                                                                    cute multi-database studies for medicines surveillance in real-world
                                                                                    setting: a reflection on the European model. Clin Pharmacol Ther.
AUTHOR CONTRIBUTIONS                                                                2020;108:228-235.
Rolf Gedeborg conceptualised and drafted the initial version of the manu-       13. Scotti L, Rea F, Corrao G. One-stage and two-stage meta-analysis of
script. All other authors contributed with critical review of the manuscript.       individual participant data led to consistent summarized evidence:
                                                                                    lessons learned from combining multiple databases. J Clin Epidemiol.
                                                                                    2018;95:19-27.
FUND ING INFORMATION                                                            14. Lin DY, Zeng D. Meta-analysis of genome-wide association studies:
This research was conducted as a development project by the Medical                 no efficiency gain in using individual participant data. Genet Epidemiol.
Products Agency, which is a Swedish Government agency. The study                    2010;34:60-66.
                                                                                15. Lin DY, Zeng D. On the relative efficiency of using summary statistics
did not receive any external funding.
                                                                                    versus individual-level data in meta-analysis. Biometrika. 2010;97:
                                                                                    321-332.
CONF LICT OF IN TE RE ST                                                        16. Karlstad Ø, Hovi P, Husby A, et al. SARS-CoV-2 vaccination and myo-
Dr Ljung reported receiving grants from Sanofi Aventis paid to his                  carditis in a Nordic cohort study of 23 million residents. JAMA Cardiol.
institution outside the submitted work; and receiving personal fees                 2022;7:600-612.
                                                                                17. Cheung Y-m, Lou J, Yu F. Vertical Federated Principal Component
from Pfizer outside the submitted work. All other authors declare no
                                                                                    Analysis on Feature-Wise Distributed Data. In Web Information Sys-
conflict of interest.                                                               tems Engineering – WISE 2021: 22nd International Conference on
                                                                                    Web Information Systems Engineering, WISE 2021, October 26–29,
DATA AVAI LAB ILITY S TATEMENT                                                      2021. Melbourne, VIC, Australia: Springer-Verlag, Berlin, Heidelberg.
                                                                                    2021;173-88.
The data that support the findings of this study are available from the
                                                                                18. Wilson RC, Butters OW, Avraam D, et al. DataSHIELD – new direc-
corresponding author upon reasonable request.                                       tions and dimensions. Data Sci J. 2017;16:1-21.
                                                                                19. Li Y, Jiang X, Wang S, Xiong H, Ohno-Machado L. VERTIcal grid lOgis-
ORCID                                                                               tic regression (VERTIGO). J Am Med Informat Assoc. 2016;23:
                                                                                    570-579.
Rolf Gedeborg     https://orcid.org/0000-0002-8850-7863
                                                                                20. Snackerstrom T, Johansen C. De-identified linkage of data across sep-
                                                                                    arate registers: a proposal for improved protection of personal infor-
RE FE R ENC E S                                                                     mation in registry-based clinical research. Ups J Med Sci. 2019;124:
 1. Azevedo LG, Soares EFdS, Souza R, Moreno MF. Modern Federated                   29-32.
    Database Systems: An Overview. 2020.                                        21. DataSHIELD CRAN - The Comprehensive R Archive Network of
 2. Yamaguchi M, Inomata S, Harada S, et al. Establishment of the MID-              DataSHIELD. 2020. Accessed 2020-10-21, at https://cran.
    NET® medical information database network as a reliable and valu-               datashield.org/
    able database for drug safety assessments in Japan. Pharmacoepide-          22. Lu CL, Wang S, Ji Z, et al. WebDISCO: a web service for distributed
    miol Drug Saf. 2019;28:1395-1404.                                               cox model learning without patient-level data sharing. J Am Med Infor-
 3. Exploring and understanding adverse drug reactions by integrative               mat Assoc. 2015;22:1212-1219.
    mining of clinical records and biomedical knowledge. European Com-          23. Banerjee S, Sofack GN, Papakonstantinou T, et al. dsSurvival: privacy
    mission, 2019. Accessed 2019-12-06, 2019, at https://cordis.europa.             preserving survival models for federated individual patient meta-
    eu/project/rcn/85424/factsheet/en                                               analysis in DataSHIELD. BMC Res Notes. 2022;15:197.
 4. Platt R, Brown JS, Robb M, et al. The FDA sentinel initiative - an          24. Schuemie MJ, Chen Y, Madigan D, Suchard MA. Combining cox
    evolving National Resource. N Engl J Med. 2018;379:2091-2093.                   regressions across a heterogeneous distributed research network fac-
 5. Trifiro G, Fourrier-Reglat A, MCJM S, Díaz Acedo C, Van Der Lei J,              ing small and zero counts. Stat Methods Med Res. 2022;31:438-450.
    Group E-A. The EU-ADR project: preliminary results and perspective.         25. McCullagh P, Nelder JA. Generalized Linear Models. 2nd ed. Chap-
    Stud Health Technol Inform. 2009;148:43-49.                                     man & Hall/CRC; 1998.
 6. The Book of OHDSI 2021. Accessed 2022-10-10, 2022, at https://              26. Carstensen B. Who Needs the Cox Model Anyway? (Version 7). In:
    ohdsi.github.io/TheBookOfOhdsi/                                                 Steno Diabetes Center C, Denmark, ed. http://bendixcarstensen.
 7. Jones E, Sheehan N, Masca N, Wallace S, Murtagh M, Burton P. Data-              com/WntCma.pdf2019
    SHIELD – shared individual-level analysis without sharing data: a bio-      27. Ali MS, Prieto-Alhambra D, Lopes LC, et al. Propensity score methods
    statistical perspective. Norwegian J Epidemiol. 2012;21:231-239.                in health technology assessment: principles, extended applications,
 8. Wolfson M, Wallace SE, Masca N, et al. DataSHIELD: resolving a con-             and recent advances. Front Pharmacol. 2019;10:973.
    flict in contemporary bioscience--performing a pooled analysis of           28. Rassen JA, Schneeweiss S. Using high-dimensional propensity scores
    individual-level data without sharing the data. Int J Epidemiol. 2010;          to automate confounding control in a distributed medical product
    39:1372-1382.                                                                   safety surveillance system. Pharmacoepidemiol Drug Saf. 2012;21-
 9. Wu Y, Jiang X, Kim J, Ohno-Machado L. Grid binary LOgistic REgres-              (Suppl 1):41-49.
    sion (GLORE): building shared models without sharing data. J Am Med         29. Rassen JA, Avorn J, Schneeweiss S. Multivariate-adjusted pharmacoe-
    Informat Assoc. 2012;19:758-764.                                                pidemiologic analyses of confidential information pooled from
                                                                                                                                                           10991557, 2023, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/pds.5587 by CochraneItalia, Wiley Online Library on [13/10/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
286                                                                                                                                    GEDEBORG ET AL.



      multiple health care utilization databases. Pharmacoepidemiol Drug        37. GNU General Public License v3 [Internet]. 2007. Accessed
      Saf. 2010;19:848-857.                                                         2019-11-25, at https://www.gnu.org/licenses/gpl-3.0.html
30.   Molenberghs G, Fitzmaurice G, Kenward M, Tsiatis A, Verbeke G,            38. Chatzidionysiou K, Hetland ML, Frisell T, et al. Opportunities and
      eds. Handbook of Missing Data Methodology. Champman & Hall/CRC;               challenges for real-world studies on chronic inflammatory joint dis-
      2014.                                                                         eases through data enrichment and collaboration between national
31.   Schafer JL, Graham JW. Missing data: our view of the state of the art.        registers: the Nordic example. RMD Open. 2018;4:e000655.
      Psychol Methods. 2002;7:147-177.
32.   van Buuren S. Flexible Imputation of Missing Data. 2nd ed. Chapman &
      Hall/CRC; 2018.                                                           SUPPORTING INF ORMATION
33.   Amices/dsMice: DataSHIELD Server-side Functions for the Mice              Additional supporting information can be found online in the Support-
      Package (version 0.2.0). 2020. Accessed 2020-11-03, at https://rdrr.
                                                                                ing Information section at the end of this article.
      io/github/amices/dsMice/
34.   Carstensen B. Who Needs the Cox Model Anyway? (Version 7). C. Steno
      Diabetes Center; 2019.
35.   Budin-Ljosne I, Burton P, Isaeva J, et al. DataSHIELD: an ethically          How to cite this article: Gedeborg R, Igl W, Svennblad B, et al.
      robust solution to multiple-site individual-level data analysis. Public      Federated analyses of multiple data sources in drug safety
      Health Genomics. 2015;18:87-96.                                              studies. Pharmacoepidemiol Drug Saf. 2023;32(3):279‐286.
36.   Gaye A, Marcon Y, Isaeva J, et al. DataSHIELD: taking the analysis
                                                                                   doi:10.1002/pds.5587
      to the data, not the data to the analysis. Int J Epidemiol. 2014;43:
      1929-1944.