The intensity                     λ                 of a counting process is a measure of the rate of change of its predictable part. If a stochastic process                     {        N        (        t        )        ,        t        ≥        0        }                 is a counting process, then it is a submartingale, and in particular its Doob-Meyer decomposition is
                    N        (        t        )        =        M        (        t        )        +        Λ        (        t        )                where                     M        (        t        )                 is a martingale and                     Λ        (        t        )                 is a predictable increasing process.                     Λ        (        t        )                 is called the cumulative intensity of                     N        (        t        )                 and it is related to                     λ                 by
                    Λ        (        t        )        =                  ∫                      0                                t                          λ        (        s        )        d        s                .
Given probability space                     (        Ω        ,                              F                          ,                  P                )                 and a counting process                     {        N        (        t        )        ,        t        ≥        0        }                 which is adapted to the filtration                     {                                            F                                            t                          ,        t        ≥        0        }                , the intensity of                     N                 is the process                     {        λ        (        t        )        ,        t        ≥        0        }                 defined by the following limit:
                    λ        (        t        )        =                  lim                      h            ↓            0                                                1            h                                    E                [        N        (        t        +        h        )        −        N        (        t        )                  |                                                    F                                            t                          ]                .
The right-continuity property of counting processes allows us to take this limit from the right.
In statistical learning, the variation between                     λ                 and its estimator                                                         λ              ^                                               can be bounded with the use of oracle inequalities.
If a counting process                     N        (        t        )                 is restricted to                     t        ∈        [        0        ,        1        ]                 and                     n                 i.i.d. copies are observed on that interval,                               N                      1                          ,                  N                      2                          ,        …        ,                  N                      n                                  , then the least squares functional for the intensity is
                              R                      n                          (        λ        )        =                  ∫                      0                                1                          λ        (        t                  )                      2                          d        t        −                              2            n                                    ∑                      i            =            1                                n                                    ∫                      0                                1                          λ        (        t        )        d                  N                      i                          (        t        )                which involves an Ito integral. If the assumption is made that                     λ        (        t        )                 is piecewise constant on                     [        0        ,        1        ]                , i.e. it depends on a vector of constants                     β        =        (                  β                      1                          ,                  β                      2                          ,        …        ,                  β                      m                          )        ∈                              R                                +                                m                                   and can be written
                              λ                      β                          =                  ∑                      j            =            1                                m                                    β                      j                                    λ                      j            ,            m                          ,                                                                  λ                      j            ,            m                          =                              m                                                1                                (                                                            j                  −                  1                                m                                      ,                                          j                m                                      ]                                  ,
where the                               λ                      j            ,            m                                   have a factor of                                           m                                   so that they are orthonormal under the standard                               L                      2                                   norm, then by choosing appropriate data-driven weights                                                                         w                ^                                                          j                                   which depend on a parameter                     x        >        0                 and introducing the weighted norm
                    ∥        β                  ∥                                                    w                ^                                                    =                  ∑                      j            =            2                                m                                                                              w                ^                                                          j                                    |                          β                      j                          −                  β                      j            −            1                                    |                        ,
the estimator for                     β                 can be given:
                                                        β              ^                                      =        arg                          min                      β            ∈                                          R                                            +                                            m                                                              {                      R                          n                                (                      λ                          β                                )          +          ∥          β                      ∥                                                            w                  ^                                                              }                        .
Then, the estimator                                                         λ              ^                                               is just                               λ                                                    β                ^                                                            . With these preliminaries, an oracle inequality bounding the                               L                      2                                   norm                     ∥                                            λ              ^                                      −        λ        ∥                 is as follows: for appropriate choice of                                                                         w                ^                                                          j                          (        x        )                ,
                    ∥                                            λ              ^                                      −        λ                  ∥                      2                          ≤                  inf                      β            ∈                                          R                                            +                                            m                                                              {          ∥                      λ                          β                                −          λ                      ∥                          2                                +          2          ∥          β                      ∥                                                            w                  ^                                                              }                        with probability greater than or equal to                     1        −        12.85                  e                      −            x                                  .