In many cases, there is an unobservable heterogeneity in the probit model. For instance, when modelling the consumption choice of a certain brand, consumers’ personal preference is unobserved but needs to be considered in the model. Owing to omitted variable or measurement error, endogeneity issue also could arise. A probit model including both of these two issues can be represented as:
  
    
      
        
          y
          
            i
            t
          
        
        =
        1
        [
        
          y
          
            i
            t
          
          
            ∗
          
        
        >
        0
        ]
      
    
    
  
  
    
      
        
          y
          
            i
            t
          
          
            ∗
          
        
        =
        
          x
          
            i
            t
          
          
            (
            1
            )
          
        
        β
        +
        
          z
          
            i
            t
          
        
        δ
        +
        
          c
          
            i
          
        
        +
        
          u
          
            i
            t
          
        
      
    
    
  
  
    
      
        
          z
          
            i
            t
          
        
        =
        
          x
          
            i
            t
          
          
            (
            1
            )
          
        
        
          γ
          
            1
          
        
        +
        
          x
          
            i
            t
          
          
            (
            2
            )
          
        
        
          γ
          
            2
          
        
        +
        
          v
          
            i
            t
          
        
      
    
    
  
where 
  
    
      
        
          c
          
            i
          
        
      
    
    
   is the unobservable heterogeneity effect and 
  
    
      
        
          u
          
            i
            t
          
        
        ∣
        
          x
          
            i
          
        
        ∼
        N
        (
        0
        ,
        1
        )
        ,
        
          v
          
            i
            t
          
        
        
          |
        
        
          x
          
            i
          
        
        ∼
        N
        (
        0
        ,
        
          σ
          
            2
          
        
        )
      
    
    
  . If 
  
    
      
        
          v
          
            i
            t
          
        
      
    
    
   and 
  
    
      
        
          u
          
            i
            t
          
        
      
    
    
   are independent, this model will degenerate to a probit model with unobservable heterogeneity. In this case, we can just integrate 
  
    
      
        P
        (
        
          y
          
            i
            T
          
        
        ,
        …
        ,
        
          y
          
            i
            0
          
        
        ∣
        
          x
          
            i
          
        
        ,
        
          c
          
            i
          
        
        )
      
    
    
   against the density of 
  
    
      
        
          c
          
            i
          
        
      
    
    
   conditional on 
  
    
      
        
          x
          
            i
          
        
      
    
    
  , then 
  
    
      
        P
        (
        
          y
          
            i
            T
          
        
        ,
        …
        ,
        
          y
          
            i
            0
          
        
        
          |
        
        
          x
          
            i
          
        
        )
      
    
    
   can be obtained  and the objective for the conditional Maximum Likelihood Estimation is
  
    
      
        
          ∑
          
            i
            =
            1
          
          
            N
          
        
        log
        
        [
        P
        (
        
          y
          
            i
            T
          
        
        ,
        …
        ,
        
          y
          
            i
            0
          
        
        
          |
        
        
          x
          
            i
          
        
        )
        ]
      
    
    
  
If 
  
    
      
        
          v
          
            i
            t
          
        
      
    
    
   and 
  
    
      
        
          u
          
            i
            t
          
        
      
    
    
   are correlated, under the normality assumption, it can be assumed that 
  
    
      
        
          v
          
            i
            t
          
        
      
    
    
   =
  
    
      
        ρ
        
          u
          
            i
            t
          
        
        +
        
          ϵ
          
            i
            t
          
        
      
    
    
  , where 
  
    
      
        
          ϵ
          
            i
            t
          
        
        
          ∼
          
            i
            i
            d
          
        
        N
        (
        0
        ,
        
          σ
          
            2
          
        
        −
        
          ρ
          
            2
          
        
        )
      
    
    
   and 
  
    
      
        
          ϵ
          
            i
          
        
      
    
    
   is independent with 
  
    
      
        
          v
          
            i
          
        
      
    
    
   and 
  
    
      
        
          u
          
            i
          
        
      
    
    
  . Then the model can be rewritten as:
  
    
      
        
          y
          
            i
            t
          
        
        =
        1
        [
        
          x
          
            i
            t
          
          
            (
            1
            )
          
        
        (
        β
        +
        δ
        
          γ
          
            1
          
        
        )
        +
        
          x
          
            i
            t
          
          
            (
            2
            )
          
        
        δ
        
          γ
          
            2
          
        
        +
        
          c
          
            i
          
        
        +
        
          ω
          
            i
            t
          
        
        >
        0
        ]
      
    
    
  
where 
  
    
      
        
          ω
          
            i
            t
          
        
        =
        (
        1
        +
        ρ
        δ
        )
        
          u
          
            i
            t
          
        
        +
        δ
        
          ϵ
          
            i
            t
          
        
        ,
         
        
          ω
          
            i
            t
          
        
        ∼
        N
        
          
            (
          
        
        0
        ,
        (
        1
        +
        ρ
        δ
        
          )
          
            2
          
        
        +
        
          δ
          
            2
          
        
        (
        
          σ
          
            2
          
        
        −
        
          ρ
          
            2
          
        
        )
        
          
            )
          
        
      
    
    
   and 
  
    
      
        
          corr
        
        (
        
          ω
          
            i
            t
          
        
        ,
        
          ω
          
            i
            ,
            t
            −
            s
          
        
        )
        =
        
          
            
              (
              1
              +
              ρ
              δ
              
                )
                
                  2
                
              
              
                corr
              
              (
              
                u
                
                  i
                  t
                
              
              ,
              
                u
                
                  i
                  ,
                  t
                  −
                  s
                
              
              )
            
            
              (
              1
              +
              ρ
              δ
              
                )
                
                  2
                
              
              +
              
                δ
                
                  2
                
              
              (
              
                σ
                
                  2
                
              
              −
              
                ρ
                
                  2
                
              
              )
            
          
        
        .
      
    
    
  
Based on this, following the same Maximum Likelihood Estimation procedure and the scaled parameter 
  
    
      
        (
        β
        +
        δ
        
          γ
          
            1
          
        
        ,
        δ
        
          γ
          
            2
          
        
        )
        
          /
        
        
          
            (
            1
            +
            ρ
            δ
            
              )
              
                2
              
            
            +
            
              δ
              
                2
              
            
            (
            
              σ
              
                2
              
            
            −
            
              ρ
              
                2
              
            
            )
          
        
      
    
    
   can be consistently estimated, then the APE  can be consistently estimated correspondingly.